For  Reference 


NOT  TO  BE  TAKEN  FROM  THIS  ROOM 


Gtx  mm 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Alberta  Libraries 


https://archive.org/details/criticalevaluatiOOpowe 


A  CRITICAL  EVALUATION  OF  THE  DETROIT  BEGINNING 
FIRST  GRADE  INTELLIGENCE  TEST 


A  THESIS  SUBMITTED  TO  THE  SCxUOOL  OF  GRADUATE  STUDIES 
IN  PARTIAL  FULFILMENT  OF  THE  REQUIREMENTS  FOR 
THE  DEGREE  OF  MASTER  OF  EDUCATION 


THE  UNIVERSITY  OF  ALBERTA 
FACULTY  OF  EDUCATION 


BY 

FREDERICK  WILLIAM  POWELL 
APRIL,  1956 


ACKNOWLEDGEMENTS 


The  persons  who  gave  access  to  the  data  used  in  this  study 
have  been  most  cooperative.  Particular  thanks  are  due  to  Mr.  L. 
Kunelius  who  supplied  the  data  from  the  Westlock  School  Division 
and  to  Mr.  W.  Wagner  and  Mr.  G.R.  Conquest  who  gave  access  to  the 
data  from  the  Edmonton  School  system.  Mir,  J.  Robson,  secretary  of 
the  Vermilion  S.D,  and  Miss  M.  Wilcox,  assistant  secretary,  were 
most  cooperative  in  supplying  the  data  of  the  Vermilion  sample. 
Thanks  are  also  due  to  Mir.  E.  Miller  who  was  superintendent  of 
the  Vermilion  S.D.  at  the  beginning  of  the  study  and  whose  advice 
and  encouragement  were  most  helpful. 

Dr.  G.M.  Dunlop  as  chairman  of  the  thesis  committee  has 
guided  the  progress  of  this  study.  Without  his  counsel,  encourag¬ 
ement,  and  enduring  patience  it  would  never  have  been  completed. 


iii 


SYNOPSIS 


Six  samples  of  Detroit  Beginning  First  Grade  Intelligence 
Test  scores  were  analyzed  during  the  progress  of  this  study; 
Vermilion,  1951 5  Westlock,  1950;  Vermilion  Rural  small  sample, 
1950-B3;  Edmonton,  I9I4.8;  Edmonton,  1 9U9;  and  Edmonton,  1953® 

All  these  samples  with  the  exception  of  Edmonton,  1953*  were 
combined  into  a  total  distribution  of  scores#  The  Vermilion, 

19 51 3  sample  of  250  cases  was  used  for  item  analysis  purposes# 

The  item  analysis  showed  that  the  Detroit  Beginning  First 
Grade  Intelligence  Test  was  too  easy  for  the  age  group  to  which 
it  was  administered,  and  that  the  items  on  the  test  do  discriminate 
between  the  lower  and  higher  groups  of  ability. 

The  study  of  the  norms  for  each  month  of  chronological  age 
for  the  combined  distribution  of  Vermilion,  Westlock,  Edmonton, 
19U8,  and  Edmonton,  19h9,  showed  a  selective  factor  to  be  in 
operation  that  gave  a  positive  correlation  of  test  score  with 
chronological  age  for  the  earlier  age  groups,  and  a  negative 
correlation  of  test  score  with  chronological  age  for  the  later 
age  groups. 

The  study  of  the  small  sample  led  to  the  conclusion  that 
performance  on  the  Detroit  Beginning  First  Grade  Intelligence 
Test  is  significantly  affected  by  school  experience.  The 
Detroit  I.Q. *s  from  tests  administered  after  six  months  of 


iv 


. 


school  experience  were  found  to  be  significantly  higher  than  those 
from  the  Otis  Alpha  and  Stanford-Binet. 

An  analysis  of  the  I.Q.  scores  for  a  sample  of  3512  cases  from 
the  Edmonton  School  System  of  1953  showed  a  mean  I.Q.  of  108.18  with 
a  S.D.  of  l6,ii5  I.Q.  points,  using  the  Detroit  norms*  Seventy  per 
cent  of  the  cases  were  found  to  be  above  I,Q,  100,  The  distribution 
of  I.Q,  scores  were  observed  to  be  grossly  different  from  the 
classification  of  David  ¥echsler  which  corresponds  closely  with  that 
of  Terman.  The  inference  to  be  drawn  from  this  observation  is  that, 
while  the  Detroit  Beginning  First  Grade  Intelligence  Test  does  dis¬ 
criminate  between  the  different  levels  of  intellectual  ability,  the 
I.Q. 1  s  computed  from  the  test  are  distributed  in  a  classification 
that  departs  widely  from  such  accepted  standards  as  those  of  Terman 
and  Wechsler. 


v 


TABLE  OF  CONTENTS 


CHAPTER  PAGE 

I.  TIE  PROBLEM . .  1 

II.  DESCRIPTION  OF  THE  DETROIT  BEGINNING  FIRST  GRADE 
INTELLIGENCE  TEST  AND  COMPARISON  WITH  OTIER  GROUP 

TESTS  IN  THE  FIELD . .  3 

The  ICuhlmann-Anderson,  Grade  I3  First  Semester  ...  3 

California  Short  Form  Test  of  Mental  Maturity 

Grades  KGN  -  1  b 

Detroit  Advanced  First  Grade  Intelligence  Test  ...  It 

Goodenough  Measurement  of  Intelligence  by  Drawings  .  U 

S.R.A.  Primary  Mental  Abilities  Ages  5-7  ......  5 

III.  ITEM  DIFFICULTY  . . 7 

IV.  DISCRIMINATORY  POWER  .................  lb 

V.  VALIDITY  .  ...............  20 

VI.  RELIABILITY . 2b 

VII.  NORMS . 23 

VIII.  CASE  STUDIES  .  ............  39 

IX.  DO  THE  DETROIT  NORMS  YIELD  AN  I.Q.  THAT  IS  TOO  HIGH?  .  57 

X.  CONCLUSIONS  . 53 

Item  Difficulty  . . 53 

Discriminatory  Power  of  the  Items . 53 

Validity  .  ......  55 

Reliability . *  • 


CHAPTER 


PAGE 


Norms  ............  .......  55 

Case  Studies . .  55 

Is  the  Detroit  I.Q„  too  High?  ............ 

General  Conclusion  .  .  .........  56 

BIBLIOGRAPHY  . .  57 

APPENDIX  .  ...................  59 


vii 


LIST  OF  TABLES 


TABLE  PAGE 

I.  Percentages  of  Correct  Responses  with  Corresponding 

Sigma  Scores  from  an  Arbitrary  Zero  Value  of  -3, 

Computed  from  Vermilion  Sample  *  .  » . .  , 

II.  ,  Percentages  of  Successes  in  Upper  and  Lower  27%  of 

Vermilion  Sample  .  .............  16 

III.  Discrimination  Values  of  Items  in  Detroit  Beginning 

First  Grade  Intelligence  Test  17 

IV.  Correlation  of  Detroit  Beginning  vs.  Kuhlmann- 

Anderson  for  Sample  of  159  Cases  from  Edmonton 

System'.  .  . . 22 

V.  Statistics  from  the  Analysis  of  Vermilion  Sample 

used  to  Derive  Split-Half  Reliability  Coefficient  .  ,  26 

VI. .  Frequency  Distribution  of  Scores,  Edmonton, 

Westlock  and  Vermilion  ...............  30 

VII.  Deciles  and  Quartiles  of  Combined  Distribution  of 

Scores  of  Detroit  Beginning  First  Grade 

Intelligence  Test  .  v. .  31 

VIII.  Statistics  from  the  Analysis  of  Detroit  Intelligence 

Test  Scores  from  the  School  Systems  of  Edmonton, 

Westlock  and  Vermilion  .  33 

IX.  Mean  Scores  for  Each  Month  of  Chronological  Age 

for  the  Ages  from  66  months  to  81).  months .  3k 

X.  Synopsis  of  Relevent  Data  from  the  Study  of  17 

Cases  in  the  Vermilion  Rural  Small  Sample .  1|0 

XI.  I.Q.  Ratings  from  Tests  Administered  to  Small 

Sample  Group  During  1st.  7  months  of  School 

Experience .  lj.1 

XII.  Statistical  Comparison  of  Detroit  Beginning  Retest 

of  Detroit  Beginning,  Detroit  Advanced,  Stanford- 
Binet  Individual,  Otis  Alpha  Form  A,  and 

California  Test  of  Mental  Maturity  .  1|.2 


viii 


■i 


TABLE 


PAGE 


XIII.  Frequency  Distribution  of  I.Q,  Scores  of 

Edmonton  1933  Sample  .  ............  U8 

XIV.  Statistics  of  the  Distribution  of  I.Q.  Scores 

for  the  Edmonton  195>3  Sample  .............  Lj.9 

XV.  Comparison  of  Distribution  of  Detroit  I.Q. *s 

with  Classification  of  David  Wechsler  . .  50 


ix 


LIST  OF  FIGURES 


FIGURES  PAGE 

1.  Histogram  of  Weighted  Scores  of  Vermilion 

Sample  of  250  Cases  with  Superimposed 

Normal  Curve  of  Same  N  and  Range  ........  12 

2.  Histogram  of  Combined  Distribution  of  Scores 

with  Superimposed  Normal  Curve  .........  32 

3.  Scattergram  of  Observed  Norms  from  Age  66 

Months  to  8L.  Months  with  Linear  Regression 

Lines . 36 


x 


CHA.PTER  I 


THE  PROBLEM 

An  intelligence  test  is  a  measuring  instrument.  Like  every 
measuring  instrument  it  may  have  flaws  in  its  structure  due  to 
theoretical  misconceptions  or  be  subject  to  errors  of  administrat¬ 
ion  that  detract  from  the  usefulness  of  the  instrument*  Even  the 
platinum-iridium  bar  in  Paris  has  been  called  into  question  as  the 
world’s  standard  of  length.  It  has  been  suggested  that  the  green 

mercury  line  in  the  spectrum  at  5I4.6I  A  of  mercury  transmuted  from 

1 

gold  be  substituted.  This  is  merely  mentioned  to  emphasize  the 
point  that  there  is  nothing  out  of  the  ordinary  in  questioning  a 
measuring  standard,  even  so  widely  accepted  a  standard  as  the 
platinum-iridium  bar  in  Paris,  On  a  less  rigorous  basis  it  is 
proposed  to  question  one  of  our  well  known  group  intelligence 
tests,  the  Detroit  Beginning  First  C-rade  Intelligence  Test,  which 
is  widely  used  as  a  psychological  measuring  instrument  in  the 
province  of  Alberta, 

A  measure  of  a  child’s  intelligence  is  a  very  valuable  piece 
of  information  to  the  teacher  or  guidance  officer.  Ideally, 
perhaps  a  teacher  should  administer  an  individual  intelligence 
examination  to  each  child,  but  for  a  class  of  twenty-five  at 
least  a  week  would  be  required.  The  alternative  is  a  valid  and 
reliable  group  test.  It  is  the  purpose  of  this  present  research 


^Farrington  Daniels, 
p.  66)4* 


Outlines  of  Physical  Chemistry  (195)4), 


project  to  make  as  careful  a  study  of  one  of  our  widely  used 
intelligence  tests  as  the  data  and  resources  available  will 
permit. 

To  the  question,  ,rWhy  administer  an  intelligence  test  at 
all?"  many  reasons  may  be  given  to  justify  the  procedure.  Modern 
education  is  a  scientific  process,  and  precise  measurement  is  the 
foundation  of  all  true  science.  If  the  teacher  is  to  understand 
how  to  promote  the  child’s  development,  it  is  important  for  him 
to  know,  to  the  extent  possible,  how  far  that  development  has 
progressed  and  at  what  rate  it  may  be  expected  to  continue.  Among 
other  uses  of  general  intelligence  tests  the  following  seem 
important : 

(1)  analysis  of  mental  growth 

(2)  classification  of  pupils  into  ability  groups 

(3)  educational  guidance 

(4)  measurement  of  the  results  of  varying  educational 
procedures 

(5)  study  of  patterns  of  delinquency. 

In  the  evaluation  of  any  psychometric  instrument,  four 
properties  must  receive  consideration.  These  are  the  objectivity, 
reliability,  validity,  and  standardization  of  the  test  or  battery 
of  tests  in  question.  This  study  will  undertake  to  present  an 
evaluation  of  the  Detroit  Beginning  First  Grade  Intelligence  Test 
using  these  properties  as  a  frame  of  reference. 


CHAPTER  II 


DESCRIPTION  OF  THE  DETROIT  BEGINNING  FIRST  GRADE 
INTELLIGENCE  TEST  AND  COMPARISON  WITH  OTHER 
GROUP  TESTS  IN  THE  FIELD 

The  Detroit  Beginning  First  Grade  Intelligence  Test  is  the 
work  of  Anna  M,  Engel  and  Harry  J,  Baker,  Since  it  was  copy¬ 
righted  in  1921  and  again. in  1937$  it  is  not  altogether  a  newcomer 
to  the  testing  scene.  Its  publishers  are  the  World  Book  Company 
of  New  York.  The  battery  consists  of  ten  subtests  with  a  total  of 
sixty  items.  It  is  designed  for  administration  to  small  groups  of 
not  more  than  twelve  children  at  a  time, 

A  variety  of  other  first  grade  intelligence  tests  are  in 
current  use.  Among  the  more  familiar  may  be  mentioned  the  Kuhlmann- 
Anderson,  Grade  I,  First  Semester;  the  California  Test  of  Mental 
Maturity,  Pre-primary,  Grades  KGN-I,  S-forrn;  the  Detroit  Advanced 
First  Grade  Intelligence  Test;  Goodenough’s  Measurement  of 
Intelligence  by  Drawings;  and  the  S.R.A,  Primary  Mental  Abilities 
for  ages  5  and  6.  These  tests  will  be  described  briefly  and  dis¬ 
cussed  on  the  basis  of  their  respective  merits  and  weaknesses, 

1.  The  Kuhlmann-Anderson,  Grade  I,  First  Semester 

This  battery  by  F.  Kuhlmann  and  Rose  Anderson  was  copyrighted 
1927 f  19U0,  and  19U2  and  published  by  the  Educational  Test  Bureau, 
Minneapolis.  For  Grade  I,  First  Semester,  the  battery  consists  of 
ten  tests  with  78  items.  Norms  are  published  in  terms  of  mental 


'3 


k 


units  derived  by  Kuhlmann  from  He  inis1  s  curve  for  the  growth  of 
mental  ability ,  as  well  as  the  more  familiar  mental  age  system  for 
evaluating  mental  ability. 

2.  California  Short-Form  Test  of  Mental 
Maturity,  Grades  KC-N  -  1 

This  is  one  of  the  more  recent  psychological  instruments  in 
the  field.  It  was  published  in  1950  by  the  California  Test  Bureau, 
Los  Angeles,  under  the  authorship  of  Elizabeth  T.  Sullivan,  Willis 
W.  Clark  and  Ernest  Tiegs.  The  grade  range  is  kindergarten  through 
grade  one.  It  is  an  example  of  some  of  the  more  recent  work  in 
factor  analysis.  There  are  seven  tests  in  the  pre-primary  battery: 
spatial  relationships  -  2  tests,  logical  reasoning  -  2  tests, 
numerical  reasoning  -  2  tests,  and  verbal  concepts  -  1  test. 

3.  Detroit  Advanced  First  Grade 
Intelligence  Test 

This  test  by  Harry  J.  Baker  was  copyrighted  1928  and  published 
by  the  World  Book  Company,  New  York.  The  norms  were  amended  in  19U5. 
This  test  is  to  be  given  to  older  beginners,  or  to  beginners  who 
have  had  two  months  school  experience. 

U,  Goodenough  Measurement  of 
Intelligence  by  Drawings 

Florence  Goodenough  developed  this  testing  technique.  The  child 
is  simply  given  a  pencil  and  paper  and  told  to  draw  a  man. 


5 


3.  S.R.A.  Primary  Mental  Abilities 
Ages  3-7 

We  have  here  a  further  example  of  a  group  intelligence  test 
produced  by  the  method  of  factor  analysis*  The  authors  are  T.G. 

Thur stone  and  L.L.  Thur stone.  The  publishers  are  the  Science 
Research  Associates  of  Chicago.  Five  factors  have  been  separated 
for  testing  in  the  primary  battery;  Verbal-Meaning,  Perceptual 
Speed,  Quantitative,  Motor,  and  Space. 

Having  identified  some  of  the  better  known  primary  intelligence 
tests,  let  us  examine  briefly  some  of  their  respective  merits  before 
passing  on  to  a  more  intensive  consideration  of  the  Detroit  Beginning 
First  Grade  Intelligence  Test  itself. 

The  Kuhlmann-Ander s on  has  the  interesting  feature  of  a  system  of 
mental  units  for  representing  mental  ability.  We  have  become  so 
accustomed  to  the  I.Q.  method  for  representing  intellectual  capacity 
that  it  is  refreshing  to  be  reminded  that  it  is  not  the  only  method. 
The  California  Test  of  Mental  Maturity  is  an  effective  and  well 
standardized  instrument.  It  will,  as  the  authors  claim,  provide  as 
reliable  a  measurement  and  more  diagnostic  information  than  most 
group  tests  in  current  use.  It  has  the  disadvantage  that  the  evidence 
for  the  separate  factors  seems  inconclusive,  and  the  incautious 
examiner  is  apt  to  be  misled  by  the  diagnostic  possibilities  of  the 
separate  factors,  when  the  total  score  is  the  most  reliable  in¬ 
dication  of  general  intelligence.  The  Goodenough  Measurement  of 
Intelligence  by  Drawings  has  the  advantage  that  an  entire  class  can 


' 


,  . 


'  :c  v  '  y.J 


! 


:  *  1 


.  •  y 


6 


be  tested  in  ten  minutes  and  the  tests  scored  at  the  rate  of 
approximately  two  minutes  apiece.  Moreover  the  results  may  be 
diagnostic  of  personality  disorder  when  evaluated  by  an  experienced 

I 

psjrchologist. 

The  Detroit  Beginning  First  Grade  Intelligence  Test  has  these 
advantages  to  recommend  it  at  the  beginning  of  the  child’s  school 
experience:  (l)  the  high  percentage  of  easy  items  gives  confidence 
to  the  beginner ,  (2)  the  absence  of  time  limits  is  an  advantage  to 

the  child  who  has  had  little  previous  experience  with  tests,,  (3) 
the  directions  are  clear  and  easy  to  follow. 

Nevertheless,  with  its  recognized  merits,  it  is  not  to  be 
expected  that,  upon  searching  analysis,  a  psychological  instrument 
would  be  found  entirely  free  from  weaknesses  in  material  and 
structure.  If  weaknesses  are  uncovered  in  psychological  tests 
developed  with  the  aid  of  all  the  ’know  how'  of  contemporary 
psychological  theory,  it  should  come  as  no  surprise  if  weaknesses 
are  discovered  in  a  test  developed  in  the  1920’ s,  and  not  changed 
essentially  since  then. 

The  intensive  study  of  the  Detroit  Beginning  First  Grade 
Intelligence  Test  will  begin  with  an  item  analysis  carried  out  on 
a  sample  from  the  Vermilion  S.D.  No.  23.  The  problem  of  item 
difficulty  will  be  examined  first,  followed  by  an  investigation 
into  the  discriminatory  power  of  the  items. 


CHAPTER  III 


ITEM  DIFFICULTY 

The  items  which  make  up  a  psychological  examination  should  be  clear 
and  unambiguous.  They  should  yield  a  numerical  score  and  the  scoring 
should  be  free  from  bias  and  personal  error  of  the  examiner.  The  items 
on  the  Detroit  Beginning  First  Grade  Intelligence  Test  are  clear,  the 
directions  are  explicit,  and  the  scoring  directions  are  precise.  Little 
latitude  is  left  for  subjectivity  in  the  scoring.  It  is  entirely  possible, 
however,  for  a  test  to  be  perfectly  objective  and  yet  prove  defective 
because  it  is  not  sufficiently  extensive  in  range  of  difficulty  to 
discriminate  between  the  lower  and  higher  levels  of  mental  ability.  An 
item  analysis  will  determine  whether  or  not  the  items  are  of  the  appro¬ 
priate  range  of  difficulty  and  possess  the  necessary  power  of  discrimi¬ 
nation  . 

To  provide  the  author  with  a  sample  for  an  item  analysis ,  tests  were 
administered  to  250  pupils  beginning  the  first  grade  in  the  Vermilion  S.D. 
No.  25  in  the  early  weeks  of  September,  1951*  The  mean  of  the  weighted 
scores  of  this  sample  was  66.62  with  a  standard  deviation  of  13.55 •  The 
mean  of  the  unweighted  scores  was  43.56  with  a  standard  deviation  of 
10.02.  124  of  the  cases  were  girls  and  126  were  boys. 

The  number  and  percentages  of  correct  responses  are  given  in  Table  I. 
Also  in  this  table  are  the  sigma  scores  which  correspond  to  these  percent¬ 
ages  for  a  normal  distribution.  The  sigma  scores  have  been  computed  using 

2 

Table  A  in  Garrett'  and  taken  from  a  zero  point  of  -3- 

2Henry  E.  Garrett,  Statistics  in  F.sycholofTr  and  Education  (1953). 

pp.  302-305* 


8 


TABLE  I 

PERCENTAGES  OE  CORRECT  RESPONSES  WITH  CORRESPONDING 
SIGMA  SCORES  FROM  AN  ARBITRARY  ZERO  VALUE  OF  -3, 
COMPUTED  FROM  VERMILION  SAMPLE 


Item 

%  Correct 

Sigma  Score 

Item 

%  Correct 

Sigma  Score 

I  1 

90.8 

1.67 

VI  1 

88.3 

1.99 

2 

91.2 

1.63 

2 

90.0 

1.72 

3 

89.6 

1.7U 

3 

80.3 

1.69 

3 

53  .3 

2.89 

3 

63.6 

2.60 

II  1 

96.  3 

1.20 

3 

33.2 

2.87 

2 

8  9.6 

1.73 

6 

36.0 

3.36 

3 

66.3 

2.58 

7 

U2.0 

2.80 

3 

72.0 

2.32 

VII  1 

80.0 

2.16 

5 

76.0 

2,29 

2 

69.2 

2.3c 

6 

63.3 

2.63 

3 

76.8 

2.27 

7 

36.0 

3.10 

3 

61.2 

2.72 

III  1 

86.8 

1.88 

3 

73.8 

2.33 

2 

82.2 

2.08 

6 

2U.8 

3.6  7 

3 

83.0 

2.01 

7 

37.6 

3.08 

3 

82.0 

2.08 

VIII  1 

88.3 

1.80 

3 

33.2 

2.92 

2 

86.0 

1.92 

6 

76.8 

2.29 

3 

72.8 

2.39 

7 

61.2 

2.72 

3 

33.2 

3.12 

IV  1 

93.6 

1.29 

3 

27.2 

3.61 

2 

91.2 

1.63 

IX  1 

86.0 

1.92 

3 

91.2 

1.63 

2 

63.6 

2.65 

3 

82.  Li 

2.07 

3 

53.2 

2.66 

3 

63.6 

2.60 

3 

27.2 

3.61 

6 

7)4.0 

2.36 

X  1 

96.8 

1.15 

7 

3o.o 

3.23 

2 

93.6 

1.38 

V  1 

100.0 

0.00 

3 

66.3 

2.58 

2 

97.6 

1.02 

3 

72.0 

2.32 

3 

96.0 

1.23 

3 

50.3 

2.99 

3 

93.3 

1.31 

5 

82.3 

2.07 

6 

83.3 

1.80 

7 

93.6 

1.38 

9 


An  examination  of  Table  I  leads  to  the  observation  that  there  is  a 
deficiency  of  more  challenging  items  in  the  battery.  It  has  been  taken 
for  granted  by  competent  psychologists  that  intellectual  capacity  is 
normally  distributed.  If  this  be  accepted.,  a  test  to  discriminate 
intellectual  ability  must  achieve  a  balance  between  items  upon  which  the 
lower  levels  of  ability  can  succeed  and  items  which  can  be  passed  by 
only  a  few  at  the  higher  levels.  The  item  analysis  would  seem  to  show 
that  the  test  lacks  balance.  To  be  specific,  Item  I  in  Test  V  was  passed 
by  100 %  of  the  cases.  An  item  such  as  this,  which  is  passed  by  all,  does 
nothing  to  discriminate  between  levels  of  intellectual  ability.  Nor 
would  an  item  with  100%  of  failures  measure  anything.  The  most  difficult 
item  in  the  battery  is  Item  6  in  Test  VII  with  a  percentage  of  successes 
of  21.8.  If  the  test  were  balanced  the  easiest  item  would  have  a  percentage 
of  successes  of  about  75* 

The  high  percentage  of  successes  at  the  high  end  of  the  scale  need 
not  be  considered  solely  the  fault  of  the  instrument.  It  may  be  that  the 
test  is  inappropriate  to  the  age  group  to  which  it  is  being  administered. 

If  a  test  were  too  difficult  for  a  class  we  would  expect  a  high  percentage 
of  failures ,  and ,  conversely  if  it  were  too  easy  we  might  expect  the 
distribution  to  be  skewed  in  the  other  direction. 

The  Detroit  Beginning  First  Grade  Intelligence  Test  is  apparently 
too  easy  for  this  age  group  -  how  much  too  easy  would  require  further 
testing  and  investigation  to  determine.  The  authors  have  recognized  the 
weakness  of  their  test  in  discriminating  intellectual  ability  at  the 
higher  levels  by  weighting  the  more  difficult  items. 


I 


'  i.  o|. 


.r  ;  '  » 

..  1  !  ’  .  '  ■  i  V 

■  ,  ••  •;  :  r,  '•  ' 

. 

'  •  ■  •  .  ;  .  r,  • 

••  :  •  ’  f  •  ;  .*  ■«  6  •}  :’•< * 

<r.  i  ‘  ,m  -  •./ 

'  •  '  '  r>  I  ‘ 

•  '  t  :  ,  .1 


-  .. 


i 


10 


They  affirm  that  this  system  of  weighting-^  results  in  a  more  normal 
distribution  of  scores  and  better  differentiates  the  various  degrees  of 
brightness  than  did  the  original  scoring  in  which  one  point  was  assigned 
to  each  item.  After  studying  the  Vermilion  data,  one  concludes  that  the 
distribution  still  departs  in  important  particulars  from  the  conformation 
of  the  normal  distribution.  This  can  be  shown  by  applying  tests  for  skewness 
to  the  distribution  of  weighted  scores.  For  Pearson’s  coefficient  of  skew¬ 
ness^  .5  is  significant.  For  the  Vermilion  distribution  -,56  was  computed. 
The  quartile  coefficient  of  skewness  for  a  normal  distribution  is  zero  with 
a  range  of  plus  1  to  minus  1.  For  the  Vermilion  distribution  the  quartile 
coefficient  of  skewness  was  computed  to  be  -.20.  The  moment  coefficient 
of  skewness  of  a  normal  distribution  is  zero.  For  the  Vermilion  distri¬ 
bution  the  moment  coefficient  of  skewness  was  computed  to  be  -.60. 

The  shape  of  a  distribution  has  a  place  in  a  discussion  of  item 
difficulty.  If  a  test  has  a  low  ceiling  the  scores  pile  up  at  the  high 
end  of  the  scale  because  there  are  too  many  easy  items.  Similarly  if  the 
items  on  a  test  are  too  difficult,  scores  tend  to  pile  up  toward  the  lox^er 
end  of  the  curve.  An  important  fact  about  a  seriously  skewed  distribution 
is  that  it  can  not  be  approximated  by  the  normal  curve. 

The  range  of  mental  ages  on  the  Detroit  norms  is  from  48  months  to 
110  months  or  a  spread  of  68  months.  Using  a  normal  distribution,  the 
location  of  an  individual  on  this  scale  should  be  determined  by  no  more 

\nna  M.  Engel  and  Harry  J.  Baker,  Detroit  Beginning  First  Grade 
Intelligence  Test ,  Manual  of  Directions  and  Key  (193?) »  p.l- 

4q.  H.  Richardson,  An  Introduction  to  Statistical  Analysis  (1944), 
pp.  150-169. 


,H  •  (< 


11 


than  six  S.D.  units  of  scores.  The  Detroit  Beginning  First  Grade  Intelli¬ 
gence  scale  has  a  range  of  102  points.  If  the  distribution  of  scores  is 
normal  the  mean  must  be  at  the  mid-point  of  the  range  of  scores  ,  that  is 
at  51 •  With  a  mean  of  66.6 2  and  a  standard  deviation  of  18.55  it  will  be 
seen  that  the  mean  of  the  Vermilion  distribution  is  .84  sigma  above  the 
mid-point  of  the  range  of  Detroit  weighted  scores.  With  a  range  of  102 
and  a  S.D.  of  18.55  there  will  be  5»5  sigma  units  in  the  scale  of  scores. 
Each  sigma  unit  will  then  correspond  to  12.4  months  of  mental  age.  A 
score  of  .84  sigma  will  correspond  to  10.4  months  of  mental  age.  This 
test  is  then  approximately  10.4  months  too  easy  or  the  group  is  10.4  months 
too  old  for  the  best  placement  of  the  test  if  a  normal  distribution  is 
to  be  expected.  With  an  actual  range  of  scores  of  3.67  sigma  the  group 
would  appear  even  more  mature  than  one  from  which  we  could  expect  to 
get  a  normal  distribution  of  scores. 

The  histogram  of  the  weighted  scores  of  250  cases  in  the  Vermilion 
distribution  has  been  drawn  in  Figure  1 ,with  a  superimposed  normal 
curve  of  the  same  range  and  number  on  the  same  axis. 


■ 


. 

. 


' 


. 

‘ 


.  " 


12 


^one-nbe^i 


w 

(d 

CQ 

as 

Q 

O 

to 

f.\2 


•H 


13 


This  gives  a  graphic  picture  of  the  asymmetry  of  the  distribution. 
This  asymmetry  is  not  peculiar  to  the  Vermilion  distribution.  As  will  be 
shown  in  Chapters  VTI  and  XX,  all  the  distributions  studied  are  skewed  and 
the  means  are  high  above  the  mid-point  of  the  range  of  scores. 

In  this  section  the  difficulty  of  each  item  on  the  Detroit  Beginning 
First  Grade  Intelligence  Test  was  determined  by  computing  the  sigma  scores 
referred  to  an  arbitrary  zero  of  -3.  From  the  observed  standard  deviation 
and  the  position  of  the  observed  mean  above  the  mid-point  of  the  range  of 
scores  it  was  estimated  that  the  age  of  this  sample  was  approximately  10.4 
months  too  old  for  the  best  placement  of  the  test. 


„ 


r 


CHAPTER  IV 


DISCRIMINATORY  POWER 

At  least  two  things  can  be  learned  from  an  item  analysis  of  a  test. 

They  are  the  level  of  difficulty  of  the  items  and  the  discriminative  power 
of  the  items.  The  level  of  difficulty  of  the  items  was  discussed  in  the 
previous  chapter  and  the  discriminatory  power  of  the  items  will  be  dealt 
with  in  this  chapter.  The  same  sample  of  250  cases  from  the  Vermilion 
S.D.  No.  25  already  used  for  determining  the  discriminatory  power  of 
the  items  will  be  used  for  determining  their  level  of  difficulty. 

Discriminating  power  of  test  items  is  closely  related  to  validity, 
and  for  validity  a  criterion  is  needed.  Charles  H.  Lawshe  gives  two  methods 
that  are  generally  employed  for  selecting  the  criterion  groups.  1  They  are 
(1)  the  use  of  an  external  criterion  and  (2)  the  criterion  of  internal 
consistency. 

When  an  external  criterion  is  used,  two  groups  that  are  known  to 
differ  with  respect  to  the  properties  being  measured  are  tested  and  the 
performance  of  each  group  on  each  item  is  determined.  The  test  will  be 
deemed  valid  if  it  shows  a  satisfactory  degree  of  discrimination  between 
the  two  groups.  This  was  presumably  the  method  used  by  the  authors  of 
the  Detroit  Beginning  First  Grade  Intelligence  Test  in  the  validation 
procedure  described  on  page  2  of  the  manual  to  the  test.  In  practice  it 
is  not  easy  to  find  clear-cut  criterion  groups  differing  only  in  the 
experimental  variable,  so  an  internal  criterion  is  often  used. 

'^Charles  H.  Lawshe  Jr.,  Principles  of  Personnel  Testing  Ch.XIII ,p.lS3. 


15 


In  using  the  criterion  of  internal  consistency  we  may  set  up  the 
extreme  portions  of  a  group  who  have  taken  the  test  as  criterion  groups. 
In  Lawshe's  method  the  highest  and  lowest  25  per  cent  are  used.^  In 
Flanagan's  method?  and  in  Davis's  method^  the  highest  and  lowest  27  per 
cent  are  used.  We  have  computed  our  item  analysis  using  2?  per  cent  of 
the  total  for  our  criterion  groups.  Table  II  gives  the  percentage  of 
correct  responses  to  each  item  in  the  test  made  by  the  criterion  groups. 
Table  III  gives  the  discrimination  indices  estimated  by  the  three  methods. 


Charles  H.  Lawshe  Jr. ,  "A  Nomograph  for  Estimating  the  Validity  of 
Test  Items."  Journal  of  Applied  Psychology  (1942),  pp.  864-849- 

?J.  C.  Flanagan,  "General  Considerations  in  the  Selection  of  Test  Items 
and  a  Short  Method  of  Estimating  the  Product-Moment  Coefficient  from  Data 
at  the  Tails  of  the  Distribution."  Journal  of  Educational  Psychology 
(Dec.  1939) »  pp*  674-680. 

'Frederick  B.  Davis,  Item  Analysis  Data  (1949)- 


2 

3 

k 

l 

2 

3 

k 

5 

6 

7 

1 

2 

3 

k 

5 

6 

7 

1 

2 

3 

U 

5 

6 

7 

1 

2 

3 

k 

5 

6 

7 


16 


TABLE  II 

PERCENTAGES  OF  SUCCESSES  IN  UPPER  AND 
LOWER  2.1%  OF  VERMILION  SAMPLE 


%L 

Item 

%E 

%L 

9k.  1 

8k.  8 

VI  1 

100.0 

70.0 

100.0 

8k.  8 

2 

100.0 

66.2 

97.0 

79.3 

3 

98.5 

55.5 

6k.  7 

kk.l 

k 

91.1 

32.3 

100.0 

92.6 

5 

75.0 

36.8 

98.3 

83.8 

6 

60.3 

8.8 

88.2 

51.5 

7 

69.1 

23.5 

9k. 1 

k8.5 

VII  1 

95.6 

60.3 

91.1 

58.5 

2 

9k.  1 

39.2 

89.3 

kk.l 

3 

98.5 

55.9 

6k.  7 

25.2 

k 

91.1 

17.6 

98.5 

61.7 

5 

97.0 

kl.2 

97.0 

66.2 

6 

57.3 

1.5 

98.5 

50.0 

7 

85.3 

7.k 

98.5 

52.9 

VIII  1 

97.0 

73.5 

77.9 

22.1 

2 

98.5 

61.7 

95.6 

kl.2 

3 

9k. 1 

38.2 

91.1 

33.8 

k 

77.9 

17.6 

98.5 

88.3 

5 

58.8 

k.k 

98.5 

77.9 

IX  1 

100.0 

63.2 

100.0 

75.0 

2 

92.6 

29. k 

95.6 

61.7 

3 

76.3 

26.5 

98.5 

29. k 

k 

61.7 

10.3 

97.0 

k5.6 

X  1 

100.0 

95.6 

76.3 

11.8 

2 

100.0 

85.3 

100.0 

100.0 

3 

91.1 

k8.5 

98.5 

.  95.6 

k 

9  8.5 

51.5 

98.3 

9k  .1 

5 

83.8 

25.0 

97.0 

85.3 

100.0 

61.7 

98.5 

69.1 

95.6 

88.2 

17 


TABLE  III 


DISCRIMINATION  VALUES  OF  ITEMS  IN  DETROIT 
BEGINNING  FIRST  GRADE  INTELLIGENCE  TEST 


Item 

Dp 

Dl 

°D 

Item' 

Dp 

Dl 

% 

I  1 

.20 

.5 

ia 

VI  1 

.70 

1.9 

aa 

2 

1.3 

32 

2 

.73 

2.0 

a7 

3 

.ao 

1.2 

36 

3 

.7a 

2.1 

52 

h 

.23 

.5 

13 

a 

.63 

1.8 

aa 

ii  i 

.35 

.3 

22 

5 

.ao 

1.0 

25 

2 

.5o 

1.2 

37 

6 

.60 

1.6 

38 

3 

.55 

1.1 

28 

7 

•a6 

1.0 

29 

h 

.55 

1.6 

38 

VII  1 

.60 

1.5 

37 

5 

.a.5 

1.1 

27 

2 

.62 

1.9 

a5 

6 

.53 

1.3 

33 

3 

.68 

2,0 

52 

7 

.a.2 

1.1 

27 

a 

.73 

2.1 

53 

III  1 

.70 

2.0 

a3 

5 

.67 

2.1 

52 

2 

.  55 

i.a 

31 

6 

,75 

2.3 

38 

3 

.70 

2.2 

55 

7 

,77 

2,6 

59 

h 

.70 

2.1 

5a 

VIII  1 

.55 

1.2 

30 

3 

.55 

1.5 

38 

2 

.65 

1.9 

a9 

6 

.65 

2.2 

55 

*3 

•6a 

1.9 

a6 

7 

.62 

1.8 

53 

a 

.60 

1.7 

a.2 

IV  1 

.50 

1.3 

31 

5 

.68 

2.0 

as 

2 

»5; 

1.5 

37 

IX  1 

.72 

2.1 

a9 

3 

.65 

1.7 

ao 

2 

.68 

2.0 

5o 

h 

.55 

1.5 

ao 

3 

.52 

1.3 

35 

5 

.80 

2.9 

68 

a 

.56 

1.6 

39 

6 

.68 

2.0 

5a 

X  1 

.30 

,7 

ia 

7 

.66 

1.9 

as 

2 

.50 

1.5 

31 

V  1 

.00 

0.0 

0 

3 

.51 

i.a 

33 

2 

.20 

.5 

ia 

a 

.76 

2.2 

6a 

3 

.30 

.6 

19 

5 

.59 

1.7 

ai 

a 

.35 

.9 

28 

3 

.75 

2.0 

a9 

6 

.65 

1.7 

aa 

7 

.25 

.5 

21 

. 


18 


The  index  of  discrimination  in  column  Dp  has  been  computed  by  the 
Flanagan  method.  The  Flanagan  method  uses  a  chart  to  estimate  the  product- 
moment  coefficient  of  correlation  from  the  percentages  of  correct  responses 
at  the  tails  of  the  distribution.  Flanagan's  chart  is  based  on  Tables  VIII 
and  IX  in  "Tables  for  Statisticians  and  Biometricians"  part  II,  edited  by 
Karl  Pearson. 

The  index  of  discrimination  in  column  DT  has  been  comruted  from  Table  I 

±J 

using  Lawshe's  nomograph.  This  index  can  vary  between  plus  and  minus  4. 

The  index  of  discrimination  in  column  Dp  has  been  computed  from  Table  I 
using  Davis's  table  for  estimating  the  discriminatory  power  of  test  items. 
This  index  varied  between  plus  and  minus  99* 

Only  one  item,  V  1,  shows  zero  index  of  discrimination.  All  the  others 
are  positive .  All  three  methods ,  then ,  show  that  the  items  in  the  Detroit 
Beginning  First  Grade  Intelligence  Test  do  discriminate  between  the  higher 
and  lower  levels  of  ability.  If  a  greater  degree  of  discrimination  were 
desired  the  procedure  would  be  to  remove  items  of  low  discrimination  from 
the  test  and  add  others  of  a  higher  index  value. 

The  three  methods  used  in  this  item  analysis  were  compared  by  means  of 
the  rank  correlation  coefficient ,  which  is  an  appropriate  statistic  where 
the  ranking  or  relative  order  of  scores  is  important.  The  correlation  of 
the  ranking  of  the  discriminatory  power  of  the  items  by  Flanagan's  method  vs. 
Lawshe's  method  was  computed  to  be  . 98  >  of  Lawshe's  method  vs.  Davis's 
method  .98,  and  of  Flanagan's  method  vs  Davis's  method  .99. 

It  would  therefore  appear  that  the  Detroit  Beginning  First  Grade 
Intelligence  Test  has  been  carefully  constructed  in  terms  of  the 


r  T  .  ‘  ‘  ' 

' 

N 

1 

-  :  •:  •  r 


\  I c 


19 


discriminatory  power  of  the  items.  No  items  were  found  of  negative 
discrimination  indices.  The  three  methods  used  give  an  array  of 
discrimination  indices ,  the  ranking  of  which  agrees  well  with  each 
other  as  shown  by  the  rank  coefficients  of  correlation.  Only  one  item, 
No.  1  in  Test  V,  showed  zero  discrimination  power.  The  item  with  the 
highest  discrimination  index  as  shown  by  all  three  methods  was  item  5 
in  Test  TV,  with  a  discrimination  index  by  Flanagan's  method  of  .80, 
by  Lawshe's  method  of  2.9 »  and  by  Davis's  method  of  68. 


CHAPTER  V 


VALIDITY 

An  indispensable  characteristic  of  a  serviceable  psychological  test 
is  validity.  The  test  should  measure  precisely  what  it  purports  to  measure, 
and,  as  far  as  possible,  nothing  else.  Furthermore,  the  factor  measured 
must  ~e  measured  without  undue  constant  error. 

Three  steps  must  be  considered  in  the  construction  of  a  valid  test. 

First  of  all  the  maker  must  have  a  clear  concept  of  what  the  test  is  to 
measure.  The  authors  of  the  Detroit  Beginning  First  Grade  Intelligence 
Test  do  not  state  explicitly  what  they  mean  by  general  intelligence;  they 
nevertheless  set  forth  confidently  to  measure  it.  That  their  test  yields  a 
satisfactory  correlation  with  a  measure  of  ability  such  as  the  Stanford- 
Binet  Individual  Intelligence  examination  would  lend  weight  to  the  opinion 
that  their  basic  concept ,  while  perhaps  vague  and  undefined ,  must  yet  have 
been  basically  sound. 

Once  a  basic  working  concept  of  general  intelligence  has  been  determined 
a  further  step  will  be  the  selection  of  the  test  items.  This  appears  to  have 
been  carefully  done  in  the  case  of  the  Detroit  Beginning  First  Grade  Intell¬ 
igence  Test.  Experience  is  needed  in  primary  work  to  appreciate  this  aspect 
of  the  problem.  The  continued  use  of  this  test  since  its  publication  more 
than  thirty  years  ago  gives  evidence  as  to  its  serviceability.  In  consider¬ 
ing  this  aspect  of  the  Detroit  test ,  it  is  well  to  bear  in  mind  that  it  can 
not  be  as  precise  a  measuring  instrument  as  some  of  the  individual  intelligence 
examinations  or  of  group  tests  designed  for  use  after  maturation  and  school 


21 


training  have  had  the  opportunity  of  exercising  their  stabilizing  effects. 

The  third  step  in  the  development  of  a  valid  psychological  instrument 
is  to  check  it  against  an  objective  criterion.  As  the  authors  of  the 
California  Test  of  Mental  Maturity  state,  there  are  no  direct  objective 
criteria  for  the  validation  of  intelligence  tests.  The  validity  of  such 
tests  may  be  indirectly  established  by  correlation  with  success  in  school 
work ,  by  the  rating  of  competent  judges ,  or  by  correlation  with  other 
tests  of  acknowledged  validity. 

Both  Kuhlmann  and  Terman  deny  the  ability  of  the  average  teacher  to 
estimate  general  intelligence  with  an j  degree  of  accuracy  and  consistency. 
Perhaps  this  is  because  of  the  subjectivity  of  such  judgments  and  perhaps 
it  is  because  such  other  factors  as  character  and  work  habits  exert  such 
a  profound  influence  on  school  achievement.  The  glib  student  with  a  ready 
answer  and  the  shy  learner  who  responds  reluctantly  are  equally  likely  to 
present  a  misleading  picture  of  real  intellectual  ability  to  the  super¬ 
ficial  observation  of  the  busy  teacher.  It  is  as  impossible  to  make  an 
accurate  estimation  of  intelligence  without  a  scientifically  constructed 
intelligence  scale  as  it  is  to  measure  character  by  the  shape  of  the  skull 
or  by  the  lines  on  the  palm  of  the  hand. 

If,  then,  it  is  not  found  practicable  to  obtain  a  reliable  estimate  by 
teachers  as  to  the  level  of  intelligence  of  children  to  serve  as  a  validating 
criterion  for  the  Detroit  Beginning  First  Grade  Intelligence  Test, 
nevertheless  the  continued  use  of  the  test  is  an  affirmation  of  its 
utility  which  gives  some  assurance  of  its  validity. 

During  the  process  of  analyzing  the  large  sample  from  Edmonton 


. 

•  .  '  , ;  J  ; ;  ;  ■ 

• 

1  -  . 

5 

' 

'  '  .  '■  ’  ■ 

(  j  . 


22 


school  system  there  were  found  159  cases  of  beginners  who  had  taken 
both  the  Detroit  Beginning  First  Grade  Intelligence  Test  and  the  Kuhlmann- 
Anderson  group  intelligence  test  of  the  appropriate  level.  This  sample  of 
159  cases  constituted  a  select  group  since  at  that  time  the  Kuhlmann- 
Anderson  was  administered  only  to  corroborate  the  finding  of  a  mental 
age  in  children  whose  intelligence  seemed  too  low  for  success  in  grade 
one.  As  might  be  expected  from  such  select  groups,  the  standard  deviations 
are  low  and  the  correlation  between  scores  on  the  two  tests  is  not 
impressively  high.  A  summary  of  the  data  from  this  analysis  is  given 
in  Table  IV. 


TABLE  IV 

CORRELATION  OF  DETROIT  BEGINNING  vs.  KUHLMANN -ANDERSON 
FOR  SAMPLE  OF  159  CASES  FROM  EDMONTON  SYSTEM 


Detroit  Beginning 

Kuhlmann - and e r s  on 

Mean  M.A. 
Variance  of  M.A. 
S.D.  of  M.A. 

63.22  months 

34.69  months 

5.89  months 

64.99  months 

94.27  months 

9.71  months 

Correlation  of  Detroit  Beginning  vs  Kuhlmann-Anderson  .54. 


The  difference  between  mean  mental  ages  on  these  tests  is  not 
highly  significant,  being  significant  at  the  .2  level  but  not  at 
the  .1  level. 

The  manual  of  the  Detroit  Beginning  First  Grade  Intelligence  Test 
contains  some  valuable  validation  data.  The  authors  have  selected  the 
Stanford -Binet  for  the  purpose  of  validating  their  test.  They  report  a 
correlation  of  .76  between  the  Stanford-Binet  mental  ages  and  the 


23 


weighted  scores  of  116  first  grade  children.  Also  in  their  manual  the 
authors  of  the  Detroit  Beginning  First  Grade  Intelligence  Test  call 
attention  to  the  manner  of  the  selection  of  the  items  to  be  retained  in 
the  test  as  an  argument  in  favor  of  its  validity.  On  page  two  they  give 
as  the  basis  for  the  retention  of  each  item  its  efficiency  in  discriminat¬ 
ing  between  pupils  of  differing  ability.  This  seems  a  reasonable  stand 
to  take ,  since  the  scientific  construction  of  a  test  may  be  considered 
an  argument  in  favor  of  its  prima  facie  or  common  sense  validity. 

It  was  demonstrated  in  Chapter  IV  that  the  Detroit  Beginning  First 
Grade  Intelligence  Test  was  a  valid  test  from  the  point  of  view  of  the 
discriminative  power  of  the  items.  We  have  seen  that  the  correlation 
between  the  Kuhlman- Anders on  and  the  Detroit  Beginning  for  a  select 
sample  is  .54.  The  manual  to  the  Detroit  Beginning  First  Grade  Intell¬ 
igence  Test  has  cited  a  correlation  of  .76  between  the  Stanford.-Binet 
mental  ages  and  the  weighted  scores  of  116  first  grade  children.  All 
this  adds  up  to  an  impressive  validity.  But  there  is  one  other 
particular  of  importance  in  considering  the  validity  of  intelligence 
test  scores  and  that  has  reference  to  the  absolute  value  of  the  I.Q. 

Not  only  is  it  important  that  the  group  intelligence  test  exhibits 
a  satisfactory  correlation  with  the  Stanford-Binet ,  but  also  that  an 
I.Q.  of  100  on  the  test  should  have  the  same  meaning  as  an  I.Q.  of  100 
on  the  Stanford-Binet.  Evidence  will  be  presented  in  Chapters  VIII  and 
IX  that  this  may  not  be  the  case  with  the  Detroit  Beginning  First  Grade 
Intelligence  Test. 


J 


•V 

r  :  ■  "  ’  ■  ; 


'•z  <.  '  . 


V( 


r 


. 


.■ 


■ 


CHAPTER  VI 


RELIABILITY 

The  consistency  with  which  a  test  measures  is  indicated  by  its 
reliability  coefficient.  There  are  three  types  of  reliability  coefficient^ , 
(l)  the  coefficient  of  equivalence  which  indicates  how  much  scores  fluctuate 
from  form  to  form  of  the  same  test;  (2)  the  coefficient  of  stability  which 
measures  the  extent  of  fluctuations  on  the  same  questions  from  one  time  to 
another;  (3)  the  coefficient  of  stability  and  equivalence  which  takes  into 
account  both  the  time  to  time  variations  and  the  form  to  form  variations 
in  test  performance. 

The  existence  of  an  alternative  form  of  a  test  permits  the  establish¬ 
ment  of  reliability  by  the  test-retest  method.  If  the  authors  of  the 
Detroit  Beginning  First  Grade  Intelligence  Test  had  expended  just  a  little 
more  effort  and  produced  an  alternative  test  at  the  same  level  they  might 
have  discovered  for  themselves  ,  before  the  test  became  widely  publicized , 
that  there  were  weaknesses  in  its  structure ,  which  they  were  to  attempt 
to  correct  later  by  weighting  the  scores.  The  Detroit  Advanced  First 
Grade  Intelligence  Test  may  be  used  as  an  alternative  for  older  beginners 
but  it  differs  from  the  Detroit  Beginning  in  too  many  particulars  to  be 
considered  an  equivalent  test.  The  level  of  difficulty  is  higher,  there 
are  fewer  tests  with  more  items ,  and  the  expedient  of  weighting  has  not 
been  resorted  to  in  the  scoring. 

In  the  derivation  of  the  coefficient  of  stability  the  same  test  is 


^Lee  J.  Cronbach,  Essentials  of  Psychological  Testing  (1949).  p.65* 


i'.J 


f 


•  ' 


J  -r 


- 


25 


given  after  an  interval  of  time  to  the  same  group  and  the  results 
correlated.  In  carrying  out  this  procedure  the  test  maker  may  be  faced 
with  a  dilemma.  If  the  interval  between  test  administrations  is  short 
the  scores  may  be  influenced  unduly  by  practice  effect  and  orientation 
to  test  procedure.  If  the  interval  is  long,  the  scores  may  be  altered 
by  maturation  or  by  learnings  which  may  have  taken  place  in  the  meantime. 

Performance  on  the  Detroit  Beginning  First  Grade  Intelligence  Test 
is  markedly  affected  by  school  experience.  For  this  reason  the  best 
reliability  coefficient  can  not  be  obtained  for  the  Detroit  Beginning 
First  Grade  Intelligence  Test  by  retesting  the  same  group  after  a  period 
of  time.  The  considerable  modification  of  test  performance  by  school 
experience  was  recognized  by  the  authors  when  they  recommended  that  the 
Detroit  Advanced  Beginning  First  Grade  Intelligence  Test  be  administered 
to  older  beginners ,  or  if  the  subject  has  had  as  much  as  two  months  school 
experience.  Nevertheless  the  manual  of  the  Detroit  Beginning  First  Grade 
Intelligence  Test  does  cite  a  reliability  coefficient  of  .76  based  on  a 
retest  of  407  cases  after  a  four  month  interval. 

In  a  case  such  as  this  where  there  is  no  alternative  form,  and  test 
performance  is  significantly  altered  by  school  and  experience ,  the 
coefficient  of  equivalence  can  best  be  estimated  by  the  split-half  method 
or  the  Kuder-Richardson  formula.  This  approach  to  the  problem  of  reli¬ 
ability  is  founded  upon  the  hypothetical  equivalence  of  the  parts  of  a 
single  test  taken  from  a  single  administration  of  that  test.  The  commonest 
way  of  doing  this  is  to  run  the  test  once ,  split  it  into  halves  and 
correlate  the  performance  on  one  half  against  the  performance  on  the  other 


26 


half .  The  reliability  coefficient  so  obtained  is  corrected  by  the 
Spearman- ^rown  prophecy  formula,  which  depends  upon  the  principle  that 
a  test’s  reliability  is  increased  by  increasing  the  number  of  the  items. 

The  split-half  technique  was  used  to  derive  a  split-half  reliability 
coefficient  from  the  unweighted  scores  of  a  sample  of  65  cases  comprising 
the  entire  beginning  first  grade  class  from  the  tow.  of  Vermilion  in  the 
fall  of  1951*  The  statistics  from  the  analysis  of  this  sample  are  given 
in  Table  V. 


TABLE  V 


STATISTICS  FROM  THE  ANALYSIS  OF  VERMILION  SAMPLE 
USED  TO  DERIVE  SPLIT -HALF  RELIABILITY  COEFFICIENT 


Number  of  Cases 

65 

Range  0-30 

Mean  of  Odd  Scores 

23.67 

S.D.  of  Odd  Scores 

3.81 

Mean  of  Even  Scores  23.65 

Correlation  of  Odd  vs  Even 

.80 

S.D.  of  Even  Scores  4.69 

The  reliability  coefficient  of  .80  needs  to  be  corrected  by  the 
Spearman-Brown  formula  before  it  can  be  inferred  for  the  complete  test. 
This  correction  gives  a  reliability  coefficient  of  .89  for  the  whole 
test.  The  split-half  method  was  used  by  the  authors  to  derive  a 
reliability  coefficient  from  the  administration  of  the  test  to  an 
unselected  group  of  116  first  grade  children,  using  weighted  scores. 

The  result  was  a  reliability  coefficient  of  .91  when  corrected  by  the 
Spearman-Brown  formula.  There  were  two  important  differences  between 
the  split-half  as  derived  in  this  study  and  that  derived  by  the  Detroit 
authors.  In  the  first  place  there  was  a  difference  in  the  number  of  the 


. 


- 


' 


. 


, 

: 

:  i;  Mil 

.  : 

-  ' 


27 


sample.  The  Detroit  authors  used  116.  The  sample  used  in  this  study- 
comprised  65  cases.  The  Detroit  authors  used  weighted  scores.  Unweighted 
scores  were  used  in  this  study. 

Using  the  item  analysis  data  for  the  250  cases  of  the  Vermilion 
distribution,  a  reliability  coefficient  can  be  derived  using  the  Kuder- 
Richardson  formula.  In  this  formula,  r  =  (— ,  our  n  is  60 , 

s  is  10.2  and  the  sum  of  pq  is  9.784.  Substituting  in  the  formula  we 
get  a  reliability  coefficient  of  .91. 

The  closeness  of  correspondence  between  these  different  coefficients 
of  reliability  would  indicate  that  the  chance  score  on  this  test  is  not 
considerable.  The  probability  of  a  pure  chance  score  on  this  test  has 
not  been  computed  but  quick  examination  of  the  nature  of  the  items  will 
reveal  why  it  can  not  be  high.  This  is  due  to  the  multiple  response 
required  for  many  of  the  items ,  which  reduces  the  probability  of  guessing 
the  correct  answer. 

In  summary,  the  Detroit  authors  have  quoted  a  reliability  coefficient 
of  .91  using  an  unselected  sample  of  116  first  grade  children.  Using  a 
sample  of  65  cases  and  unweighted  scores  a  reliability  coefficient  of 
.89  was  derived  during  this  study.  Using  the  item  analysis  data  from 
the  Vermilion  sample  and  the  Kuder-Richardson  formula  a  reliability 
coefficient  of  .91  was  derived  from,  a  sample  of  250  cases. 


CHAPTER  VII 


NORMS 

A  measuring  instrument  is  valueless  if  it  is  not  standardized.  There 
are  absolute  measures ,  such  as  the  pound  of  mass ,  which  are  constant  through¬ 
out  the  universe ,  and  relative  measures  such  as  the  pound  of  weight  which 
are  constant  for  a  particular  locality.  The  relative  measure  is  usually 
the  easiest  to  verify  and  to  work  with.  It  can  be  derived  from  the  absolute 
measure  if  we  know  its  characteristic  deviations. 

In  the  field  of  intelligence  testing  we  do  not  as  yet  have  what  may 
be  considered  an  absolute  measure  of  general  intelligence.  It  is  the 
usual  procedure  of  practical  psychologists  to  use  one  of  the  individual 
intelligence  examinations  as  a  criterion  when  establishing  standards  for 
group  intelligence  tests.  The  individual  intelligence  examination  most 
widely  used  for  this  purpose  is  the  Stanford-Binet . 

Two  serious  problems  confront  the  psychologist  who  sets  out  to 
establish  norms  for  a  test  of  general  intelligence.  They  are,  "What  is 
the  nature  of  general  intelligence? "  and ,  "What  factors  determine  general 
intelligence?"  There  are  so  many  diverse  definitions  of  general  intelli¬ 
gence  given  by  psychologists  that  one  is  led  to  suspect  that  general 
intelligence  is  not  properly  definable  at  all,  but  is  a  basic  intuitive 
concept  fundamental  to  the  science  of  psychology. 

Nevertheless ,  if  tests  are  to  be  developed  a  sound  working  definition 
of  intelligence  needs  to  be  formulated.  There  are  three  fundamental 
factors  involved  in  our  concept  of  general  intelligence:  (l)  heredity, 


29 


(2)  maturation,  (3)  education  in  its  widest  sense  as  the  concerted 
social  tendency  to  impart  information  and  to  develop  the  potentialities 
of  the  evolving  personality.  Practically  the  development  of  mental 
factors  is  considered  to  begin  at  or  soon  after  the  time  of  birth  and 
continues  till  its  arrest  some  time  after  the  twelfth  year.  Mental  growth 
as  a  function  of  time  levels  off  after  maturity  and  begins  to  decline 
with  progressive  deterioration  in  senescence.  The  development  of 
general  intelligence  over  a  period  of  time  shows  a  general  trend  that 
may  be  plotted  as  a  curve  from  which  norms  can  be  derived  for  intermediate 
points .  The  mental  growth  curve  is  not  a  simple  function ,  however ,  and 
till  all  factors ,  including  the  nature  of  the  human  mind  itself ,  are 
better  understood,  it  is  safest  to  stay  close  to  empirical  data.  The 
empirical  data  from  which  the  authors  of  the  Detroit  Beginning  First 
Grade  Intelligence  Test  have  developed  their  norms  are  more  extensive 
and  comprehensive  than  could  possibly  be  attained  in  a  minor  research 
project  such  as  this.  It  might,  however,  have  proved  interesting  to 
have  developed  local  norms  on, the  basis  of  the  few  thousands  of  tests 
available.  A  detailed  study  of  the  samples  available  will  reveal  why 
this  was  not  found  practicable.  Let  us  examine  the  combined  distribution 
of  scores  which  is  presented  in  Table  VI. 


30 


TABLE  VI 

FREQUENCY  DISTRIBUTION  OF  SCORES,  EDMONTON, 
WESTLOCK  AND ' VERMILION 


Class 

Limits 

Mid  Point 

Frequency 

99.5 

104.5 

102 

4 

94.5 

99.5 

97 

44 

89.5 

94.  *> 

92 

128 

84.5 

89.5 

87 

256 

79.5 

84.5 

82 

393 

74.5 

79.5 

77 

501 

69 .5 

74.5 

72 

523 

64.5 

69.5 

67 

478 

59.5 

64.5 

62 

465 

54.5 

59.5 

57 

405 

49.5 

54.5 

52 

326 

44.5 

49.5 

47 

213 

39.5 

44.5 

42 

150 

34.5 

39.5 

37 

152 

29.5 

34.5 

32 

124 

24.5 

29.5 

27 

83 

19.5 

24.5 

22 

55 

14.5 

19.5 

17 

48 

9.5 

14.5 

12 

32 

4.5 

9.5 

7 

14 

-.5 

4.5 

2 

8 

Percentiles  may  be  computed  from  this  table.  The  median  and  the 
1st  and  3rd  qua. r tile  are  highly  important  in  computing  the  quartile 
deviation  and  the  quartile  coefficient  of  skewness.  The  10th  and  90th 
percentiles  are  necessary  for  the  computation  of  the  percentile  coefficients 
of  skewness  and  kurtosis.  The  deciles  and  quartiles  of  the  combined  dis¬ 
tribution  of  scores  are  presented  in  Table  VII  . 


33- 


TABLE  VII 


DECILES  AND  QUARTILES  OF  COMBINED  DISTRIBUTION  OF  SCORES 
OF  DETROIT  BEGINNING  FIRST  GRADE  INTELLIGENCE  TEST 


Score 

Decile 

102 

10 

84 

9 

79 

8 

75 

7 

70 

6 

66 

5 

60 

4 

56 

3 

-50 

2 

37 

1 

Score 

Quartile 

77 

3rd 

66 

2nd 

53 

1st 

The  quartile  deviation  is  12.  It  is  interesting  to  note  that 
the  lowest  10$  are  distributed  over  a  range  of  37 »  which  is  almost  a  third 
of  the  total  range  of  scores.  It  seems  that  the  lower  scores  could  have 
been  more  efficiently  utilized ,  particularly  when  you  realize  that  a 
score  of  one  represents  a  mental  age  of  42  months. 

Some  further  interesting  features  concerning  the  distribution  of 
Detroit  Intelligence  Test  scores  in  Alberta  school  systems  were  brought  to 
light  during  the  analysis  of  samples  from  the  school  systems  of  Westlock , 
Vermilion  and  Edmonton.  Some  of  the  statistics  computed  for  these  samples 
are  given  in  Table  VIII.  All  scores  in  this  table  are  weighted. 


X 


32 


j£0U@ttt>d3,j[ 


. 


, 


■ 


33 


TABLE  VIII 

STATISTICS  FROM  THE  ANALYSIS  OF  DETROIT  INTELLIGENCE 
TEST  SCORES  FROM  THE  SCHOOL  SYSTEMS  OF  EDMONTON, 
WESTLOCK  AND  VERMILION 


School 

System 

Year 

N 

Mean  Age 
In  Months 

S.D.of 

Age 

Mean 

Score 

S.D.of 

Scores 

Skewness 
(Pearson' s) 

Edmonton 

1948 

1838 

73.^ 

4.07 

62.28 

i 

18.44 

1 

*_n 

W  i 

Edmonton 

1949 

2144 

73.82 

4.32 

63.71 

18.00 

-.46 

V/es  block 

1950 

170 

76.09 

6.59 

64.80 

17.05 

-,78 

Vermilion 

1951 

.  250 

75.83 

5.14 

66.62 

18.55 

-.56 

Combined 

4402 

73.8  7 

4.52 

63.32 

18.20 

-.48 

The  histogram  of  the  total  distribution  of  scores  appears  in 
Figure  3»  together  with  a  superimposed  normal  curve.  It  will  be  noted 
that  the  distribution  of  Detroit  scores  is  bi-modal ,  but  the  mode  at 
X=37  is  not  markedly  above  the  succeeding  ordinate.  The  distribution 
is  negatively  skewed.  The  amount  of  skewness,  -.48  has  been  computed 
using  Pearson's  formula.  As  pointed  out  in  Chapter  TIT,  a.  measure  of 
skewness  of  .5  would  be  considered  large  as  computed  by  this  formula. 
It  is  not  as  large  as  that  computed  for  the  Vermilion  distribution , 
owing,  probably,  to  the  fact  that  the  combined  distribution  as  a  whole 
is  a  younger  group  than  the  Vermilion  sample.  Two  further  measures  of 
skewness  have  been  computed  for  the  total  dis trioution;  Lowley  s 
quartile  coefficient  of  skewness  of  -.08,  and  the  moment  coefficient 
of  skewness  of  -1 .68. 


34 


The  chief  defect  of  fit  comes  from  the  situation  of  the  mean  close 
to  two-thirds  of  a  standard  deviation  above  the  mid -point  of  the  ranae  of 
scores.  This  leads  to  the  assumption  that  the  test  is  too  easy  for  this 
age  group.  The  number  of  items  upon  which  a  successful  response  can  be 
expected  decreases  with  a  decrease  in  the  age  of  the  child.  It  could  be 
predicted  from  an  examination  of  the  data  of  this  study,  that  a  younger 
group,  other  things  being  equal,  would  give  a  mean  closer  to  51 »  and  a 
better  fit  with  the  normal  distribution. 

The  mean  score  was  computed  for  each  month  in  the  range  where  there 
were  sufficient  cases  to  provide  an  adequate  sample.  These  data  are 
presented  in  Table  IX. 


TABLE  IX 

MEAN  SCORES  FOP.  EACH  MONTH  OF  CHRONOLOGICAL  AGE 
FOR  THE  AGES  FROM  66  MONTHS  TO  84  MONTHS 


Age 

N 

=q~—  . . . 

Mean  Score 

Age 

N 

Mean  Score 

66 

39 

58.2 

76 

381 

66.5 

67 

107 

54.5 

77 

350 

67.8 

68 

223 

58.0 

78 

278 

69.1 

68 

287 

57.2 

79 

168 

67.4 

70 

340 

60.4 

80 

136 

66.8 

71 

404 

58.4 

81 

81 

63.0 

72 

352 

62.8 

82 

53 

65.3 

73 

330 

62.8 

83 

38 

72.4 

74 

350 

66.4 

84 

21 

60 .3 

75 

322 

66.1 

The  4320  cases  in  the  above  table  are  from  the  school  systems 


of  Edmonton,  Vermilion  and  West-lock. 


35 


For  further  clarity  the  above  results  are  shown  graphically  in 
Figure  3  with  the  curve  of  the  Detroit  Norms  for  comparison.  It  will  be 
noted,  that  there  is  a  fairly  uniform  trend  toward  increase  in  score  with 
increase  in  chronological  age  from  66  months  to  ?8  months.  After  the  age 
of  78  months  the  scores  begin  to  decline  and  to  scatter.  This  must  be  due 
to  a  selective  factor  in  the  population  under  study.  It  can  not  be  due  to 
an  educational  factor,  since  all  the  cases  are  beginners.  It  is  this 
selective  factor  which  vitiates  the  sample  for  the  derivation  of  norms. 

In  the  controversy  over  the  course  of  mental  growth  there  have  been 
protagonists  of  the  view  that  the  final  arrest  of  mental  growth  is  a.  cul¬ 
mination  of  a  levelling  off  process  setting  in  after  the  rapid  development 
of  early  childhood;  but  there  are  none  for  the  view  that  there  is  a  negative 
acceleration  between  the  ages  of  78  and  84  months.  These  cases  from  age 
78  to  84  months  are  a  more  slowly  maturing  group,  as  shown  by  their  test 
performance  in  comparison  with  the  lower  ages  in  the  sample.  They  have 
been  retained  in  the  home  and  not  sent  to  school  as  soon  as  they  reached 
school  age.  Exactly  why  they  begin  attendance  at  school  later  than  the 
greater  part  of  the  sample  would  furnish  an  interesting  topic  for  further 
investigation.  No  attempt  is  made  to  furnish  a  reason  for  the  later 
attendance  of  this  particular  group  of  beginners.  It  is  merely  pointed 
out  that  for  the  age  groups  from  78  to  84  months  the  correlation  between 
age  and  score  is  -.37,  showing  a  decrease  in  performance  with  increase 
in  age. 

For  the  purpose  of  illustration  only  the  regression  lines  have  been 
computed;  (a)  for  the  segment  of  the  population  from  age  66  to  age  7 8 


•  '  •.{ 


• 

< 


■ 


' 


l  t  '  •  '  ■  -.<0 


■ 


' 


- 


36 


months,  and  (b)  from  age  78  to  84  months.  Taking  scores  on  the  Y  axis 
and  age  on  the  X  axis  ,  these  lines  are  the  locus  of  equations  for 

(a)  Y  =  1.15X  -  20.6  with  a  standard  error  of  estimate  of  2.24  and  for 

(b)  Y  =  -.68  121.4  with  a  standard  error  of  estimate  of  3-73* 

The  line  -  from  66  months  to  78  months  is  roughly  parallel  to  the 

line  of  Detroit  norms  in  its  upper  segment  but  at  a  higher  level.  If  the 
regression  line  x<rere  computed  for  all  the  age  groups  from  66  to  84 
months  that  line  would  still  be  above  the  Detroit  norms  since  the  regression 
line  passes  through  the  general  mean  of  the  population,  which  has  been 
computed  to  be  age  73.87  months  and  score  of  63.32. 

It  is  the  general  conclusion  of  this  section  of  the  study  that 
satisfactory  norms ,  or  at  least  norms  superior  to  the  norms  published  by 
the  Detroit  authors  can  not  be  derived  from  the  data  available  in  the 
files  of  our  Alberta  school  systems.  The  distributions  are  too  seriously 
skewed.  If  we  fall  back  on  purely  empirical  data  we  find  the  phenomenon 
of  a  decrease  in  test  performance  with  increase  in  chronological  age. 

It  may  be  presumed  that  this  is  due  to  a  selective  factor,  the  precise 
nature  of  which  would  require  a  separate  study  to  determine,  possibly 
by  interview  and  questionnaire. 

Before  leaving  this  section  it  may  be  worth  while  to  note  an 
assumption  in  the  norms  of  the  Detroit  Beginning  First  Grade  Intelligence 
Test  that  it  seems  difficult  to  justify.  The  first  score  stands  for  a 
mental  age  of  42  months.  The  second  score  represents  an  increase  of  one 
month  and  the  third  no  increase  at  all.  It  is  possible  to  score  all  three 
of  these  mental  ages  by  getting  only  one  item  correct,  depending  on  its 


weighting. 


38 


In  summary  it  may  be  well  to  reaffirm  the  two  reasons  why  it  is 
not  feasible  to  develop  Alberta  norms  from  the  data  as  they  have  been 
found  available  in  this  study:  (l)  the  combined  distribution  was  found 
to  be  skewed  to  the  extent  of  -.48  using  Pearson’s  coefficient  of 
skewness,  and  (2)  from  the  age  of  78  months  to  84  months  there  was 
found  to  be  a  negative  correlation  of  test  score  with  chronological 
age,  r=  -.37.  The  first  condition  precludes  the  use  of  the  normal  curve 
for  the  development  of  norms ,  the  second  precludes  the  use  of  empirical 
data  for  their  development. 


CHAPTER  VIII 


CASE  STUDIES 

The  case  study  method  is  a  fruitful  one  in  psychological 
research.  It  is  not  so  productive  of  wide  generalizations  as 
statistical  studies  of  large  samples^  but  may  give  an  insight  to 
the  research  worker  that  is  not  to  be  gained  from  the  analysis  of 
large  masses  of  data  where  small  but  perhaps  significant  details 
are  lost  to  view.  Since  however  the  comprehensive  case  studies  of 
seventeen  individuals  would  run  this  part  of  the  study  to  a  dis¬ 
proportionate  lengthy  tables  will  be  presented  giving  the  essential 
information  in  concise  form.  Some  of  this  information  will  be  found 
in  Table  X. 

The  cases  which  furnished  the  material  for  this  study  were  not 
selected  in  such  a  way  as  to  be  considered  representative  of  the 
population  at  large.  They  are  the  beginning  grade  one  pupils  met 
with  during  four  years  of  teaching  in  rural  schools.  Only  three  were 
girls.  The  mean  Detroit  Beginning  First  Grade  I.Q.  was  less  than  100. 
Nevertheless  the  observation  of  this  small  group  has  been  a  rich  and 
rewarding  experience.  A  summary  of  the  I.Q.  ratings  on  the  tests 
administered  during  the  progress  of  this  study  is  given  in  Table 


XI 


4  o 


TABLE  X 

SYNOPSIS  OF  RELEVENT  DATA  FROM  THE  STUDY  OF  I? 
CASES  IN  THE  VERMILION  RURAL  SMALL  SAMPLE 


Case 

Birthdate 

Da.  Mo.  Yr. 

Sex 

Nationality/ 

Maternal 

Nationality 

Paternal 

Health 

Personality 

Problems 

a 

6 

May  *44 

M 

Scottish 

English 

enuresis 

Unwanted 

child 

b 

23 

June  'L»6 

M 

Irish 

English 

good  i 

shyness  and 
timidity 

c 

15 

July  ’45 

F 

Polish 

Polish 

Excellent 

shy  and  in¬ 
active 

d 

2 

Sept.  ’46 

M 

Ukrainian 

Polish 

good 

inclined  to 
mischief 

e 

29 

Sept.  ’46 

M 

English 

Polish 

good 

recent  death 
of  father 

f 

28 

Jan.  ’46 

M 

German 

Ukrainian 

tonsilitis 

displaced 

family 

g 

11 

May  * 46 

M 

Ukrainian 

Ukrainian 

lack  of 
bowel  con¬ 
trol 

no  English 

h 

28 

Apr.  ’45 

M 

Norwegian 

English 

excellent 

egocentric- 

ity 

i 

9 

Apr.  ’45 

M 

Irish 

English 

good 

excessive 

shyness 

j 

5 

Oct.  1 45 

M 

English 

Polish 

excellent 

quarrelsome 

k 

14 

Apr.  ’43 

F 

Ukrainian 

Polish 

good 

shy  quiet 
child 

1 

31 

July  ’45 

M 

Polish 

Ukrainian 

excellent 

no  English 

m 

12 

July  ’45 

M 

Irish 

German 

immaturity 

problem  of 
reversals 

n 

10 

Aug.  ’47 

M 

Irish 

English 

overweight 

shyness,, 

passivity 

0 

2 

Sept.  *47 

F 

Ukrainian 

Polish 

excellent 

well 

adjusted 

P 

26 

Jan.  *48 

M 

English 

Polish 

good 

immaturity 

q 

17 

Mar .  *47 

M 

Poll sh 

Ukrainian 

excellent 

negativism 

- 

T 


1 


r 


~ 


'  ;• 


I 


1 


I 


] 


r  ^  • 


i 


0 


i 


I.Q.  RATINGS  FROM  TESTS  ADMINISTERED  TO  SMALL  SAMPLE 
GROUP  DURING  1st.  7  MONTHS  OF  SCHOOL  EXPERIENCE 


Case 

Detroit 
Beginning 
Start  of 
Year 

Detroit 
Beginning 
Retest  6 
months 

Stanford- 
Binet  In¬ 
dividual 
Form  L 

Detroit 

Advanced 

7  months 

Otis 

Alpha 

Form  A 

6  months 

California 
Test  of 
Mental 
Maturity 

a 

87 

108 

98 

107 

105 

b 

80 

106 

95 

no 

109 

c 

im 

107 

99 

no  ■ 

87 

95 

d 

113 

128 

108 

127 

115 

96 

e 

128 

132 

nii 

138 

122 

?97 

jy 

JL 

65 

86 

85 

97 

88 

62 

g 

76 

92 

83 

100 

85 

81i 

h 

103 

126 

112 

132 

109 

100 

i 

79 

98 

87 

io5 

86 

88 

3 

123 

135 

116 

131 

118 

cm 

CM 

H 

k 

81 

118 

96 

nli 

100 

1 

88 

122 

92 

in 

102 

100 

m 

78 

87 

88 

100 

90 

88 

n 

88 

9k 

92 

106 

98 

0 

135 

118 

100 

128 

P 

99 

nli 

107 

122 

89 

q 

91 

no 

89 

105 

8li 

Mean 

95.76 

110.65 

97.71 

111,1.29 

99.07 

9U.73 

q  *n 

19.75 

. . . . . 

ili.99 

10.2 

12.  UO 

12.59 

19.02 

The  product-moment  coefficients  of  correlation  were  computed  for 
the  Detroit  Beginning  vs.  Detroit  Beginning  Retest*  for  the  Detroit 
Beginning  vs.  Detroit  Advanced*  for  the  Detroit  Beginning  vs.  Stanford- 
Binet  Individual*  for  the  Detroit  Advanced  vs.  Stanford-Binet  Individual* 
for  the  Stanford-Binet  Individual  vs  California  Test  of  Mental  Maturity* 
and  for  the  Stanford-Binet  Individual  vs.  Otis  Alpha  form  A.  Table  XII 
contains  a  summary  of  the  statistical  comparison  of  these  tests. 


TABLE  XII 

STATISTICAL  COMPARISON  OF  DETROIT  BEGINNING  RETEST 
OF  DETROIT  BEGINNING*  DETROIT  ADVANCED*  STANFORD- 
BINET  INDIVIDUAL*  OTIS  ALPHA  FORM  A*  AND 
CALIFORNIA  TEST  OF  MENTAL  MATURITY 


Tests 

r 

Difference 
of  means 

S.E.  of 
Difference 
of  Means 

N 

C.R. 

Detroit  Beginning  vs. 
Retest 

.81 

13.89 

2.91 

17 

5.12 

Detroit  Beginning  vs. 
Detroit  Advanced 

.85 

18.53 

1.78 

17 

io.m 

Detroit  Beginning  vs. 
Stanford-Binet 

.78 

1.95 

3.U6 

17 

.56 

Detroit  Advanced  vs. 
Stanford-Binet 

.93 

16.58 

1.13 

17 

13.67 

Stanford-Binet  vs, 
California  M.M. 

<  .30 

3.36 

5.26 

11 

.63 

Stanford-Binet  vs. 

Otis  Alpha  form  A. 

.70 

« 

CX) 

— vj 

2. 59 

15 

.35 

:  >' 


* 


!.i3 


All  the  correlation  coefficients  in  Table  III  are  positive. 

The  highest  is  between  the  Detroit  Advanced  and  the  Stanford-Binet 
Individual.  The  lowest  mean  I.Q.  was  that  obtained  from  the 
California  Test  of  Mental  Maturity  which,  like  the  Detroit  Beginning 
was  administered  at  the  beginning  of  the  school  year.  Close  to  that 
of  the  California  Test  of  Mental  Maturity  is  the  mean  of  the  Detroit 
Beginning  at  93.76.  Both  of  these  means  are  below  the  normal  I.Q. 
of  100,  A  study  of  Table  X  will  reveal  a  partial  explanation  of 
this  circumstance.  It  will  be  noted  that  three  of  the  cases,  Mfn, 
ng",  and  "1”,  began  their  grade  one  without  a  knowledge  of  the 
English  language.  This,  almost  certainly  had  a  depressing  effect  on 
their  scores,  and  consequently,  upon  the  means  of  so  small  a  sample. 
Learning  a  language  takes  time;  the  lack  of  facility  in  English 
expression  was  apparent  on  the  Stanford-Binet  Individual  tests  even 
after  six  months  of  school  experience.  It  will  be  noted  in  Table 
IX  that  the  Detroit  Beginning  First  Grade  Intelligence  Test  and  the 
California  Test  of  Mental  Maturity  also  exhibited  the  largest 
standard  deviations,  19.73  for  the  Detroit  and  19.02  for  the 
California  Test  of  Mental  Maturity. 

There  is  a  significant  difference  between  the  mean  of  the  Detroit 
Beginning  and  its  retest  six  months  later.  The  difference  between  the 
mean  of  the  Detroit  Beginning  end  the  Detroit  Advanced  administered 
seven  months  later  is  also  highly  significant.  This  might  lead  to  the 
interpretation  that  there  had  been  a  real  and  highly  significant 
saltation  in  mean  I.Q.  A  measure  of  intelligence  from  an  individual 
examination  is  a  more  valid  measure  than  can  be  furnished  by  a  group 


•  „ 


■  . 

■ 

•  •  •  .  '  ,  ' 


■  <■? 


r 


4 


intelligence  test  for  the  same  individual.  The  mean  Stanford-Binet 
I.Q.  of  98  ought  to  be  considered  the  best  estimate  of  the  intellig¬ 
ence  of  this  group.  The  fact  that  the  I.Q.  ratings  from  the  Otis 
Alpha,  form  A,  administered  six  months  after  the  beginning  of  school 
correspond  closely  with  the  ratings  on  the  Stanford-Binet  furnishes 
corroboration  for  the  most  valid  mean  I.Q.  of  this  group  as  being 
located  very  close  to  98.  The  mean  I.Q.  on  the  Detroit  Beginning  at 
the  start  of  the  year  and  on  the  California  Test  of  Mental  Maturity, 
also  at  the  start  of  the  year,  are  slightly  lower,  but  the  differences 
in  means  between  these  measures  and  the  mean  of  98  on  the  Stanford- 
Binet  are  not  significant. 

If  an  I.Q.  of  98  is  the  best  estimate  of  the  intelligence  of  this 
group,  how  are  we  to  account  for  a  mean  I.Q.  of  111  obtained  from  the 
Detroit  Beginning  retest,  or  the  I.Q.  of  lli;  obtained  from  the  Detroit 
Advanced  after  six  months?  The  differences  between  the  mean  I.Q. ’ s  of 
the  Detroit  Beginning  and  the  Detroit  Advanced,  and  of  the  Detroit 
Advanced  and  the  Stanford-Binet  are  highly  significant  and  certainly 
not  due  to  chance.  Let  us  examine  Table  XI  and  consider  these  sets  of 
tests  separately.  For  the  Detroit  Beginning  and  its  retest  the 
critical  ratio  is  3.12,  greater  than  the  C.R.  of  2.92  at  the  one  per 
cent  level  of  significance,  for  an  N  of  17,  For  the  difference  between 
means  of  the  Detroit  Beginning  and  the  Detroit  Advanced  the  C.R.  is 
10. Ill  which  is  again  highly  significant.  Similarly,  for  the  difference 
between  means  of  the  Detroit  Advanced  and  the  Stanford-Binet,  we  have 
a  C.R.  much  larger  than  can  reasonably  be  accounted  for  by  chance 
variation  or  errors  of  sampling.  The  most  obvious  explanation  of 


I 


'  f 


£ 


these  large  highly  significant  differences  is  that  the  Detroit  Tests 
measure  factors  which  are  subject  to  significant  modification  by 
school  experience. 

The  Detroit  Advanced  is  proposed  by  the  authors  as  an  alternat¬ 
ive  to  the  Detroit  Beginning  First  Grade  Intelligence  Test.  It  would 
seem,  however,  that  performance  on  the  Detroit  Advanced  is  also  sub¬ 
ject  to  considerable  modification  by  school  experience,  if  the  highly 
significant  increase  in  I.Q.  is  to  be  accounted  for.  The  Detroit 
Beginning  administered  at  the  beginning  of  the  first  term  seems  to  be 
a  more  valid  measure  than  the  Detroit  Advanced  administered  after 
seven  months.  The  correlation  between  the  Stanford-Binet  Individual 
and  the  Detroit  Advanced  of  .91+  is  high,  but  there  is  a  mean  difference 
of  16.58  which  is  highly  significant.  The  correlation  between  the 
Stanford-Binet  and  the  Detroit  Beginning  at  the  start  of  the  year  is 
.78,  but  the  difference  between  the  means  is  not  significant. 

In  summarizing  the  conclusions  from  this  part  of  our  study  it 
must  be  borne  in  mind  that  it  deals  with  a  small  sample.  The  findings 
are  proposed,  not  as  being  proved  beyond  all  doubt,  but  rather  as 
tentative,  and  pointing  in  directions  where  further  search  and  sub¬ 
stantiation  might  prove  fruitful.  The  primary  conclusion  to  be 
derived  from  this  section  of  our  study  is  that,  after  a  period  of 
six  months  the  I.Q. 1  s  from  the  Detroit  Beginning  First  Grade 
Intelligence  Test  show  a  considerable  increase  over  those  obtained 
at  the  beginning.  How  this  increase  is  distributed  has  not  been 
investigated,  but  it  seems  probably  that  the  curve  is  a  type  of 


learning  curve  rather  than  a  linear  function.  Care  ought  to  be 
exercised  therefore  that  the  Detroit  Beginning  First  Grade 
Intelligence  Test  be  administered  at  a  point  in  the  child’s  school 
experience  that  will  give  as  much  time  as  is  necessary  for  the  child 
to  ’’feel  at  home"  in  the  school  situation,  but  before  the  abnormal 
increase  in  Detroit  I.Q.  due  to  school  experience  begins  to  become 
appreciable. 

A  second  conclusion  is  that  the  mean  I.Q, ’s  obtained  on  the 
Detroit  Beginning  First  Grade  Intelligence  Test  and  the  Detroit 
Advanced  First  Grade  Intelligence  Test  at  a  period  of  six  months 
from  the  beginning  of  school  experience  are  abnormally  high, 

A  third  conclusion  is  that  the  performance  on  a  paper  and 
pencil  test  will  become  more  stable  as  the  child’s  school  experience 
progresses.  The  evidence  in  support  of  this  contention  is  that  the 
standard  deviation  on  the  Detroit  Beginning  First  Grade  Intelligence 
Test  was  larger  at  the  beginning  than  after  a  period  of  six  months. 

To  recapitulate,  the  findings  of  this  section  of  the  study  are: 
(1)  there  is  a  considerable  increase  in  I.Q.  if  the  Detroit  Beginning 
is  administered  later  in  the  year  rather  than  at  the  beginning;  (2) 
the  I.Q. ’s  obtained  from  both  the  Detroit  Beginning  First  Grade 
Intelligence  Test  and  the  Detroit  Advanced  First  Grade  Intelligence 
Test  after  the  middle  of  the  school  year  are  abnormally  high;  (3) 
the  performance  on  the  Detroit  Beginning  First  Grade  Intelligence 
Test  becomes  more  stable  after  a  period  of  school  experience , 


' 


;  )•;”  t 


•,  ,  .r 


■ 


CHAPTER  IX 


DO  THE  DETROIT  NORMS  YIELD  AN  I.Q. 

THAT  IS  TOO  HIGH? 

The  concept  of  normal  I.Q.  is  an  average  of  intellectual 
ability.  This  average  has  been  arbitrarily  established  at  I.Q. 

100.  How  close  does  the  mean  of  a  population  of  Detroit  I.Q.’s 
approach  this  mean  of  100?  We  have  studied  the  Detroit  Beginning 
First  Grade  Intelligence  Test  using  the  weighted  and  unweighted 
scores,  and  have  examined  the  I.Q.  scores  in  small  samples.  In 
this  part  of  the  study  it  is  proposed  to  examine  the  distribution 
of  I.Q.  scores  in  a  large  sample.  The  sample  of  3512  Detroit 
Beginning  First  Grade  Intelligence  Test  I.Q.  scores  from  the 
Edmonton  system  for  the  year  1953  will  be  used  for  that  purpose. 

The  frequency  distribution  of  these  I.Q.  scores  is  given  in  Table 
XIII. 

The  mean  age  of  this  sample  was  found  to  be  6  years  and  3.19 
months  with  a  standard  deviation  of  5.57  months.  The  mean  weighted 
score  was  65.53  with  a  standard  deviation  of  18.07.  Both  the  dis¬ 
tributions  of  ages  and  scores  are  skewed,  positively  for  ages  and 
negatively  for  scores.  Using  Pearson’s  coefficient  of  skewness, 
the  skewness  of  the  distribution  of  ages  was  found  to  be  .67  and 
the  skewness  of  the  distribution  of  scores  was  found  to  be  -.55. 


TABLE  XIII 


FREQUENCY  DISTRIBUTION  OF  I.Q.  SCORES  OF 
EDMONTON  1953  SAMPLE 


Mid  Point 

Frequency 

157 

k 

152 

5 

1U7 

26 

1I4.2 

U8 

137 

91 

132 

1U9 

12? 

230 

122 

31U 

117  ' 

393 

112 

U65 

107 

380 

102 

hl2 

97 

219 

92 

300 

87 

228 

82 

116 

77 

53 

72 

32 

6? 

22 

62 

12 

57 

7 

52 

5 

U7 

1 

3512 

The  range  of  these  I.Q.  scores  was  from  Ij.6  to  157 •  The  mean  I.Q. 
was  computed  to  be  108.18  with  a  standard  deviation  of  16,I|5.  The 
median  was  computed  to  be  109.09.  Pearson1 s  coefficient  of  skewness 
was  computed  to  be  -.17.  This  coefficient  of  skewness  is  much  smaller 
than  that  of  the  distribution  of  ages  and  of  scores  since  the  dis¬ 
tribution  of  intelligence  quotients  is  the  resultant  of  the  function 
of  ages  which  is  positively  skewed  and  the  function  of  test  performance 
which  is  negatively  skewed. 


Using  Table  XIII  some  statistics  have  been  computed  for  the 
distribution  of  I.Q,  scores  for  the  Edmonton  Sample.  These 
statistics  are  presented  in  Table  XIV. 

TABLE  XIV 


STATISTICS  OF  THE  DISTRIBUTION  OF  I.Q. 
SCORES  FOR  THE  EDMONTON  1953  SAMPLE 


N 

3512 

Percentiles 

Range  U6  to  157  I.Q.  points 

10th  ' 

86.72 

25th 

95.81; 

S.D,  16.U5 

5oth 

109.09 

75th 

119. 3U 

Averages 

90  th 

123.89 

Arithmetic 

Mean  108.18 

Median 

109.09 

Mode 

112.00 

Quartile  Deviation 

11.75 

Coefficients 

of  Skewness 

Percentile  Coefficient  of 

Eurtosis 

-.32 

Pearson’ s 

”.23 

Quartile 

V, 

Percentile 

-3.78 

The  working  concept  of  general  intelligence  has  been  based  on 
that  of  a  mean  of  intellectual  ability  for  the  population  under 
consideration  with  5 0 %  of  the  population  having  that  ability  in  a 
lesser  degree  and  5 0 %  having  that  ability  in  a  greater  degree. 
General  consent  has  located  the  arbitrary  reference  point  for  the 
scale  of  intellectual  ability  by  defining  that  average  as  I.Q.  100. 
The  distribution  of  Detroit  I.Q. ’s  departs  grossly  from  this  widely 
accepted  hypothetical  distribution.  The  hypothetical  distribution 


-  ’ 


' 


\ 


f 


V  •  • 


50 


has  5 0 %  of  the  population  below  I.Q.  100.  The  Detroit  I.Q.  dis¬ 
tribution,  on  the  basis  of  this  sample  of  3512  cases  has  only  30$ 
of  the  cases  below  I.Q.  100. 

It  is  not  only  from  the  concept  of  the  bi-symmetry  of  the 
distribution  of  I.Q. *s  about  the  mean  of  100  that  the  Detroit 
Distribution  of  I.Q. *s  departs.  Table  XV  gives  a  comparison  of 
the  distribution  of  Detroit  I.Q. *  s  with  the  classification  of 
David  We chsler . ~ 


TABLE  XV 

COMPARISON  OF  DISTRIBUTION  OF  DETROIT  I.Q. ‘s 
WITH  CLASSIFICATION  OF  DAVID  WECHSLER 


I.Q.  Limits 

$W 

$D 

% 

-D 

Io“H 

CO 

& 

i 

o 

ffcVt)2 

128  and  over 

2.2 

12.  hi 

78 

538 

360 

129600 

1661.55 

120-127 

6.7 

13.33 

235 

568 

233 

51*289 

231.02 

111-119 

16.1 

21.98 

565 

727 

207 

52859 

75.85 

91-110 

50.0 

37.31 

1756 

1328 

-528 

183185 

105.32 

80-90 

16.1 

10.97 

565 

379 

-186 

35596 

61.23 

66-79 

6.7 

2.85' 

235 

100 

-135 

18225 

77.55 

65  and  under 

2.2 

.77 

78 

27 

-51 

2601 

33.35 

£2255.85 

In  Table  XV  the  column  %[  refers  to  the  per  cent  in  the  Wechsler 
distribution,  %d  refers  to  the  per  cent  observed  in  the  Detroit 

-^avid  Wechsler,  Measurement  of  Adult  intelligence  (19 WO, 

pp.  39— U 0. 


- 

, 

:  '  '  ■  '  .  ; 


I 


Si 


Distribution,  f[>j  to  the  frequency  in  the  Wechsler  distribution  and 
f]3  to  the  frequency  in  the  Detroit  distribution.  The  chi-square  test 
which  is  applied  in  the  remainder  of  the  table  may  be  considered 
almost  superfluous  for  distributions  so  obviously  out  of  fit  as  the 
two  under  consideration.  Nevertheless  it  is  simple  to  apply  and  has 
been  carried  through.  The  chi-square  test  is  applied  with  7-1  df, 
and  the  null  hypothesis  is  that  there  is  no  discrepancy  of  fit  between 
the  observed  and  theoretical  distributions.  The  largest  chi-square 
value  in  the  tables  of  Fisher  and  Yates"  is  16,81  at  the  ,01  level 
with  6  df.  Since  the  obtained^  is  enormously  larger  than  this  the 
null  hypothesis  is  rejected.  The  disagreement  in  fit,  in  point  of 
fact,  is  gross  in  every  category  of  the  distribution. 

It  may  be  argued  that  Wechsler* s  distribution  is  applicable  only 

12 

to  adults,  but  as  Wechsler  noted  ,  the  difference  between  his  class¬ 
ification  and  that  of  Terman  is  numerically  small,  and  Terman* s 
classification  is  applicable  to  children.  Furthermore,  while  there 
may  be  fluctuations  in  individual  I.Q. *s,  generally,  the  intelligence 
quotient  tends  to  be  fairly  stable  from  childhood  to  maturity.  No 
argument  is  here  presented  for  the  superiority  of  the  Wechsler  class¬ 
ification  over  that  of  the  Detroit.  It  may  even  be  in  the  nature  of 
a  revelation  that  the  distribution  of  Detroit  I.Q. *  s  is  entirely 
different  from  the  accepted  classification  of  Terman  and  Wechsler. 

Fisher  and  Frank  Yates,  Statistical  Tables  Table  IV 
(1933),  p.  ill. 

^ "David  Wechsler,  Measurement  of  Adult  Intelligence  (19)40* 


P.  39 


As  far  as  ranking  goes*  it  has  not  been  proven  that  the  Detroit 
Beginning  First  Grade  Intelligence  Test  is  not  a  valid  test  of 
intelligence.  A  revision  of  the  norms,  however,  would  seem  to  be 
in  order  before  the  assumption  can  be  warranted  that  the  Detroit 
I.Q.  implies  essentially  the  same  thing  as  the  Stanford-Binet  I.Q. 
or  the  IJechsler  I.Q.  It  seems  inadvisable,  however,  to  revise  the 
norms  for  a  test  which  lacks  the  span  of  difficulty  to  measure  the 
population  for  which  it  was  intended,  from  a  population  subject  to 
a  selective  factor,  and  from  a  distribution  of  scores  which  is 
gravely  skewed. 

Is  the  Detroit  I.Q.  too  high?  If  we  assume  that  the  population 

of  the  city  of  Edmonton  is  representative  of  the  general  population 

of  the  province,  the  answer  is,  definitely,  yes.  However,  other 

studies  have  confirmed  the  finding  of  a  mean  I.Q,  for  Alberta  urban 

13 

populations  of  well  above  100.  The  truth  may  well  be  that  It  is 
not  a  clear-cut  case  of  either  the  Detroit  I.Q.  is  too  high  or  that 
the  Edmonton  population  is  a  superior  group  within  the  parent  popul¬ 
ation.  Both  may  be  to  an  extent  true.  The  further  elucidation  of 
this  problem  furnishes  a  rich  field  for  future  research. 

13 

T.  James  Reid  and  George  R.  Conquest,  "A  Survey  of  Language 
Achievement  of  Alberta  School  Children,"  Alberta  Journal  of 
Educational  Research  1:2  (June  1933 )*  pp.  1|3-U6.  ~ 

Clarence  E.  Clime nhaga,  "A  Survey  of  Arithmetical  Achievement 
of  Grade  Eight  Pupils  in  Alberta  Schools,"  Alberta  Journal  of 
Educational  Research  1:1$  (Dec.  1935 ),  pp.  37-33'. 


-  „  ; 


r  / 


■ 


. '  f 


CHAPTER  X 


CONCLUSIONS 

The  conclusions  of  this  thesis  will  be  presented  in  order  of 
appearance. 


Item  Difficulty 

The  test  showed  itself  deficient  in  difficult  items  required 
to  balance  the  easy  items  on  the  scale.  In  other  words  the  test 
was  found  to  be  too  easy  for  the  group  to  which  it  was  administered. 
It  wqs  estimated  that  the  sample  with  a  mean  age  of  6  years  and  h 
months  was  10, U  months  older  than  the  optimum  placement  of  the  test 
would  indicate.  The  percentage  difficulty  of  the  items  was  computed 
and  reduced  to  sigma  scores.  It  was  estimated  that  the  range  of 
difficulty  of  the  items  was  only  61%  of  that  which  should  have  been 
possible  with  the  mean  and  standard  deviation  observed. 

Discriminatory  Power  of  the  Items 
The  percentages  of  successes  in  the  upper  and  lower  27  per  cent 
of  the  Vermilion  sample  of  250  cases  were  computed  and  discrimination 
indices  were  determined  using  three  different  methods,  that  of 
Flanagan,  Lawshe,  and  Davis,  It  was  found  that  the  items  in  the  test 
did  discriminate  between  the  higher  and  lower  levels  of  ability.  The 
three  methods  agreed  well  in  the  ranking  of  the  discrimination  indice 
The  rank-correlation  coefficients  were  computed  to  be:  Flanagan  vs. 
Lawshe,  .98,  Lawshe  vs.  Davis,  .98,  and  Flanagan  vs.  Davis,  .99. 


Validity 

In  the  previous  section  the  Detroit  Beginning  First  Grade 
Intelligence  Test  was  shown  to  be  a  valid  test  in  the  sense  that 
the  items  on  the  test  really  discriminate  between  the  higher  and 
lower  levels  of  ability.  In  this  section  a  correlation  of  .5U  was 
computed  between  the  Detroit  Beginning  and  the  Kuhlmann-Anderson 
for  1 59  cases  from  the  Edmonton  system.  For  the  small  sample  a 
correlation  of  .  7U  was  computed  between  the  Detroit  Beginning  and 
the  Stanford-Binet  for  a  sample  of  17  cases.  With  reference  to 
the  discriminative  power  of  the  items  and  correlation  with  other 
tests,  the  Detroit  Beginning  First  Grade  Intelligence  Test  is  a 
valid  test.  In  the  small  sample  study  and  in  Chapter  IX,  however, 
it  was  shown  that  a  Detroit  I.Q.  of  100  is  not  the  mean  of  the 
population.  In  this  respect  it  departs  from  accepted  practice,  and 
is  not  a  valid  test. 


Reliability 

The  split-half  reliability  coefficient  was  computed  for  a  small 
sample  of  65  cases  from  the  town  of  Vermilion.  It  was  found  to  be 
.89.  Using  the  percentages  of  correct  responses  in  the  250  cases  of 
the  Vermilion  sample  and  the  S.D.  of  the  sample,  a  reliability  co¬ 
efficient  derived  by  means  of  the  Kuder -Richards on  formula  was  found 
to  be  .91.  Added  to'  the  information  published  in  the  manual  to  the 
test,  these  coefficients  would  indicate  for  the  test  a  satisfactory 
degree  of  reliability. 


55 


Norms 

The  data  available  were  not  found  suitable  for  the  develop¬ 
ment  of  Alberta  norms.  The  distribution  of  scores  was  found  to  be 
too  seriously  skewed  for  norms  to  be  developed  using  the  normal 
curve . 

A  selective  factor  was  found  to  operate  in  the  population  that 
gave  a  positive  correlation  of  test  score  with  chronological  age  for 
the  earlier  age  groups  and  a  negative  correlation  of  test  score  with 
chronological  age  for  the  later  age  groups.  This  selective  factor 
renders  the  sample  unsuitable  for  the  development  of  norms.  However, 
in  the  section  on  item  difficulty  it  was  shown  that  the  test  was  too 
easy  for  the  age  group  to  which  it  is  being  administered  in  Alberta. 

If  steps  were  taken  to  eliminate  this  selective  factor  it  would 
facilitate  the  development  of  Alberta  norms. 

Case  Studies 

The  conclusion  from  this  section  of  the  study  was  that  performance 
on  the  Detroit  Beginning  first  Grade  Intelligence  Test  shows  signif¬ 
icant  improvement  with  school  experience.  About  the  middle  of  the 
school  year,  the  Detroit  I.Q.  was  found  to  be  significantly  higher  than 
such  other  measures  of  intellectual  capacity  as  the  Stanf ord-Binet 
Individual  and  the  Otis  Alpha  group  intelligence  tests. 

Is  the  Detroit  I.Q.  too  High? 

An  analysis  of  3512  Detroit  Beginning  First  Grade  Intelligence 
Test  I.Q,  scores  showed  a  mean  I.Q.  of  108.18  with  a  S.D,  of  16. U5. 


Seventy  per  cent  of  the  I.Q.  scores  were  above  I.Q.  100.  The 
distribution  of  Detroit  I.Q.  scores  was  found  to  be  significantly 
different  from  the  theoretical  distribution  of  David  Wechsler, 
having  too  many  cases  in  the  upper ,  too  few  in  the  lower ,  and  too 
few  in  the  middle  categories.  The  Detroit  follows  a  classification 
that  is  peculiar  to  itself,  and  differing  in  important  respects 
from  the  classification  of  Terman  and  Wechsler. 

General  Conclusion 

The  findings  of  this  study  seem  to  emphasize  the  wisdom  of  not 
taking  a  test  tailored  to  fit  a  particular  population  and  assuming 
uncritically  that  it  will  suit  our  own  requirements.  It  may  do  so, 
but  we  should  not  assume,  without  using  standard  statistical  checks, 
that  it  will.  We  should  not  only  subject  our  measurements  to  a 
rigorous  criticism,  but  we  should  also  examine  our  measuring  in¬ 
struments  closely.  A  musical  instrument  will  get  out  of  tune,  A 
psychometric  instrument  may  get  out  of  adjustment  with  its 
population.  The  critical  periodic  examination  of  our  standardized 
educational  measuring  instruments  is  one  of  the  important  fields  of 
modern  educational  research. 


BIBLIOGRAPHY 


Statistical 

Fisher,  R.A.  Statistical  Methods  for  Research  Workers. 
New  York :  Hafner  Publishing  Co.,  19'5>0. 

Fisher,  R.A.  and  Yates,  F.  Statistical  Tables.  New 
York:  Hafner  Publishing  Co.,  I9I4C » 

Garrett,  H.E.  Statistics  in  Psychology  and  Education. 
Longmans,  Green  and  Co.,  Toronto,  19  5l, 

Kenny,  J.F.  and  Keeping,  E.S.  Mathematical  Statistics. 
*  Part  II.  New  York:  D.  Van  No strand  Co., 

19U0. 

Lindquist,  E.F,  Statistical  Analysis  in  Educational 
Research.  Boston:  Houghton-Mifflin  Co,, 

19Eo7" 

Richardson,  C.K.  An  Introduction  to  Statistical 

Analysis,  New  York:  Harcourt,  Brace  and 
Co.,  I9I4IU 

Wilks,  S.S,  Elementary  Statistical  Analysis . 

Princeton:  Princeton  University  Press, 

1 9k9. 


Psychometric 

Bur os,  O.K.  Mental  Measurements  Yearbook.  Highland 
Park,  N.J.:  Gryphon  Press,  1933. 

Climenhaga,  Clarence  E.  nA  Survey  of  Arithmetical 

Achievement  of  Grade  Eight  Pupils  in  Alberta 
Schools.”  The  Alberta  Journal  of  Educational 
Research  I:U  (December,  1933). 

Cronbach,  Lee  J«  Essentials  of  Psychological  Testing. 
New  York:  Harper  Brothers,  19U9*~ 

Davis,  F.R.  Item  Analysis  Data.  Cambridge:  Graduate 
School  of  Education,  Harvard  University, 

19)49. 


\ 


Flanagan,  J.C,  "General  Considerations  In  the 

Selection  of  Test  Items  and  a  Short  Method 
of  Estimating  the  Product-Moment  Coefficient 
from  Data  at  the  Tails  of  the  Distribution." 
Journal  of  Educational  Psychology 

(December,  1939)'.  ~ 

ICuhlmann,  Frederick  Kuhlmann-Anderson  Intelligence  Tests 
for  Ages  Six  to  Maturity.  Instruction  Manual, 
5th  Ed.  Minneapolis:  Educational  Publishers, 
19)42. 

Mursell,  James  L.  Psychological  Testing.  New  York: 
Longmans,  Green  and  Co.,  15*50. " 

Laws he ,  Charles  H.  Jr.  Principles  of  Personnel  Testing. 
Toronto :  McC-raw  Hill  Book  Co.,  l9'IIB.~ 

Peid,  T.  James  and  Conquest,  George  R.  "A  Survey  of 
Language  Achievement  of  Alberta  School 
Children. "  The  Alberta  Journal  of  Educational 
Research  I;  2^' June,  1955)*” 

Terman,  L.M.  and.  Merril,  M.A.  Measuring  Intelligence. 

New  York:  Houghton  Mifflin  Co.,  1937. 

Thur stone,  L.L,  Primary  Mental  Abilities.  Chicago: 

The  University  of  Chicago  Press,  1938. 

Wechsler,  David  The  Measurement  of  Adult  Intelligence. 

Baltimore:  The  Williams  &  Wilkin  Co.,  19LL • 


59 


APPENDIX 


6o 


TABLE  OF  NORMS 


Score 

M.A. 

Score 

M.A. 

Score 

M.A, 

Score 

M.A. 

1 

1*2.0 

26 

53.5 

51 

66.6 

76 

82.7 

2 

1*2.1* 

27 

55.0 

52 

67.2 

77 

83.5 

3 

*2.8 

28 

55.5 

53 

67.8 

78 

85.1 

1* 

U3.2 

29 

55.0 

55 

68.5 

79 

85.8 

5 

1*3.6 

30 

55.5 

55 

69.0 

80 

85.5 

6 

1*1*.  0 

31 

56.0 

56 

69.6 

81 

86.2 

7 

1*1*.  5 

32 

56.5 

57 

70.2 

82 

86.9 

8 

1*1*.  8 

33 

57.0 

58 

70.9 

83 

87.6 

9 

*5.2 

35 

57.5 

59 

71.* 

85 

88.3 

10 

55.6 

35 

58. C 

60 

72.0 

85 

89.0 

11 

56.0 

36 

58.5 

61 

72.6 

86 

39.7 

12 

56.5 

37 

59.0 

62 

73.2 

87 

90.5 

13 

57.0 

38 

59.5 

63 

73.8 

88 

91.2 

11* 

57.5 

39 

60.0 

65 

75.5 

89 

91.9 

15 

58.0 

50 

60.5 

65 

73.0 

90 

92.6 

16 

U8.5 

51 

61.0 

66 

75.7 

91 

93,3 

17 

59.0 

52 

61.5 

67 

76.5 

92 

95.0 

18 

5  9.5 

53 

62,0 

68 

77.1 

93 

95.8 

19 

50.0 

55 

62.3 

69 

77.8 

95 

95.6 

20 

50.5 

55 

63.0 

70 

78,5 

95 

96.5 

21 

51.0 

56 

63.6 

71 

79.2 

96 

97.2 

22 

51.5 

57 

65.2 

72 

79.9 

97 

98.0 

23 

52.0 

58 

65.8 

73 

80.6 

9  8 

98.9 

21* 

52.5 

59 

65.5 

75 

81.3 

99 

99.8 

23 

33.0 

50 

66.0 

75 

82.0 

100 

101.7 

- — - - 

This  table  has  been  built  up  from  the  data  of  the  combined  dis¬ 


tributions  of  Edmonton,  Vermilion,  and  Westlock,  using  standard  scores, 
and  the  table  page  1*2  in  Terman  and  Merril,  "Measuring  Intelligence." 

It  is  based  on  1*1*00  cases.  Its  use'  is  not  recommended  because  of  the 
low  maximum  I.Q.’s  possible  for  older  ages,  but  it  will  give  a  better 
fit  with  the  Terman  classification  of  intelligence  than  the  table  in 


the  test  manual 


1 


£ 


I 


